Awaiting all Scala Futures to complete - scala

I have a for comprehension like this using an image library that supports concurrent operations https://github.com/sksamuel/scrimage :
for (
file <- myDirectory.listFiles;
image <- AsyncImage(file);
scaled <- image.scale(0.5)
// possibly more ops here?
)
{
scaled.writer(Format.PNG).write(new File(dirOutput + file.getName))
}
here is the definition for write():
def write(out: OutputStream)(implicit executionContext: ExecutionContext): Future[Unit] = Future {
writer.write(out)
}
What I want to do is wait until all my images in my directory finish resizing before my program shutdowns. From what I gleaned from other posts on SO is to basically stuff all these Futures into a list of Futures and and use Await for that to finish... Can anyone help me out here?

The problem you are encountering is that you are mixing two different monadic comprehensions, i.e. listFiles returns a List[File] while image.scale returns a Future[Image]. While both can be used in a for-comprehension, they cannot be mixed. Also, image <- AsyncImage(file) should be an assignment, since it just returns an instance:
// start the scaling operations (this is a List comprehension)
val scaleOps: List[Future[Unit]] = for {
file <- myDirectory.listFiles
image = AsyncImage(file)
// Now transform the image into a Future[Unit] with a nested Future comprehension
ops = for {
scaled <- image.scale(0.5)
written <- scaled.writer(Format.PNG).write(new File(dirOutput + file.getName))
} yield { written }
} yield { ops }
// collect and await the results
Await.result(Future.sequence(scaleOps), 10 minutes)
In the outer comprehension <- always produces an item from a List and yield returns the final list.
In the inner comprehension <- always produces the value of the Future, allowing you to use it as input for another call producing a Future, hence producing a Future[Unit] out of the image.

Related

Why is Future considered to be "not referentially transparent"?

So I was reading the "Scala with Cats" book, and there was this sentence which I'm going to quote down here:
Note that Scala’s Futures aren’t a great example of pure functional programming because they aren’t referentially transparent.
And also, an example is provided as follows:
val future1 = {
// Initialize Random with a fixed seed:
val r = new Random(0L)
// nextInt has the side-effect of moving to
// the next random number in the sequence:
val x = Future(r.nextInt)
for {
a <- x
b <- x
} yield (a, b)
}
val future2 = {
val r = new Random(0L)
for {
a <- Future(r.nextInt)
b <- Future(r.nextInt)
} yield (a, b)
}
val result1 = Await.result(future1, 1.second)
// result1: (Int, Int) = (-1155484576, -1155484576)
val result2 = Await.result(future2, 1.second)
// result2: (Int, Int) = (-1155484576, -723955400)
I mean, I think it's because of the fact that r.nextInt is never referentially transparent, right? since identity(r.nextInt) would never be equal to identity(r.nextInt), does this mean that identity is not referentially transparent either? (or Identity monad, to have better comparisons with Future). If the expression being calculated is RT, then the Future would also be RT:
def foo(): Int = 42
val x = Future(foo())
Await.result(x, ...) == Await.result(Future(foo()), ...) // true
So as far as I can reason about the example, almost every function and Monad type should be non-RT. Or is there something special about Future? I also read this question and its answers, yet couldn't find what I was looking for.
You are actually right and you are touching one of the pickiest points of FP; at least in Scala.
Technically speaking, Future on its own is RT. The important thing is that different to IO it can't wrap non-RT things into an RT description. However, you can say the same of many other types like List, or Option; so why folks don't make a fuss about it?
Well, as with many things, the devil is in the details.
Contrary to List or Option, Future is typically used with non-RT things; e.g. an HTTP request or a database query. Thus, the emphasis folks give in showing that Future can't guarantee RT in those situations.
More importantly, there is only one reason to introduce Future on a codebase, concurrency (not to be confused with parallelism); otherwise, it would be the same as Try. Thus, controlling when and how those are executed is usually important.
Which is the reason why cats recommends the use of IO for all use cases of Future
Note: You can find a similar discussion on this cats PR and its linked discussions: https://github.com/typelevel/cats/pull/4182
So... the referential transparency simply means that you should be able to replace the reference with the actual thing (and vice versa) without changing the overall symatics or behaviour. Like mathematics is.
So, lets say you have x = 4 and y = 5, then x + y, 4 + y, x + 5, and 4 + 5 are pretty much the same thing. And can be replaced with each otherwhenever you want.
But... just look at following two things...
val f1 = Future { println("Hi") }
val f2 = f1
val f1 = Future { println("Hi") }
val f2 = Future { println("Hi") }
You can try to run it. The behaviour of these two programs is not going to be the same.
Scala Future are eagerly evaluated... which means that there is no way to actually write Future { println("Hi") } in your code without executing it as a seperate behaviour.
Keep in mind that this is not just linked to having side effects. Yes, the example which I used here with println was a side effect, but that was just to make the behaviour difference obvious to notice.
Even if you use something to suspend the side effect inside the Future, you will endup with two suspended side effects instead of one. And once these suspended side effects are passed to the interpreater, the same action will happen twice.
In following example, even if we suspend the print side-effect by wrapping it up in an IO, the expansive evaluation part of the program can still cause different behavours even if everything in the universe is exactly same for two cases.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
// cpu bound
// takes around 80 miliseconds
// we have only 1 core
def veryExpensiveComputation(input: Int): Int = ???
def impl1(): Unit = {
val f1 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f2 = f1
val f3 = f1
val futures = Future.sequence(Seq(f1, f2, f3))
val ios = Await.result(futures, 100 milli)
}
def impl2(): Unit = {
val f1 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f2 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f3 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val futures = Future.sequence(Seq(f1, f2, f3))
val ios = Await.result(futures, 100 milli)
}
The first impl will cause only 1 expensive computation, but the second will trigger 3 expensive computations. And thus the program will fail with timeout in the second example.
If properly written with IO or ZIO (without Future), it with fail with timeout in both implementations.

Scala Future Type mismatch issue

I am running into an issue with Scala and multiple futures in a for yield scenario. Both f1 and f2 are futures. f2 future is based on a value obtained from f1 future.
val result = for {
f1 <- Await.result(dao.findNode("nodeA"), 5 seconds) // This returns back a MyNode class
f2 <- if (f1 != None && f1.isUpAndRunning)
Future { LookupResult(true, f1.varA, f1.varB) }
else
lk ? Lookup(sm.id, sm.("address"))
} yield(f1, f2)
Dependent on the result of f1 I either do a Lookup() when my if statement evaluates to false (which takes some time and returns back a LookupResult) or I mimic a default LookupResult.
Getting the following error back:
Type mismatch. Required: Option[B_], found: Future[(MyNode, Any)]
Am I just not mapping the result correctly? e.g. should I use asInstanceOf somewhere as whatever I do I cannot get this to compile.
Many thanks guys.
Chances are your dao.findNode method results in a Future[Option[MyNode]], so the Await.result makes f1 an Option[MyNode]. It's not that clear what type you want, but you can get a Future[(Option[MyNode], LookupResult)] with this
dao.findNode("nodeA"), 5.seconds)
.flatMap { nodeOpt: Option[MyNode] =>
nodeOpt.filter(_.isUpAndRunning)
.map { node =>
// node exists and is up and running
val finalResult = nodeOpt -> LookupResult(true, node.varA, node.varB)
// it's much more efficient in this scenario to use
// Future.successful than Future { }
Future.successful(finalResult)
}
.getOrElse {
// node didn't exist, or wasn't up and running
val lookupFut = lk ? Lookup(sm.id, sm.("address"))
lookupFut.map { lookupResult =>
nodeOpt -> lookupResult
}
}
}
The rough outline of what's happening is:
use dao to find the node
schedule a callback for when we get the (possibly nonexistent) node to check if it's up and running (in which case we short-circuit to our now-known final result), otherwise perform the lookup and build a final result if that lookup succeeds
Future.successful is used because Future {} will almost certainly incur more overhead scheduling a task on a thread and completing the future than constructing the final result.

Confused about cats-effect Async.memoize

I'm fairly new to cats-effect, but I think I am getting a handle on it. But I have come to a situation where I want to memoize the result of an IO, and it's not doing what I expect.
The function I want to memoize transforms String => String, but the transformation requires a network call, so it is implemented as a function String => IO[String]. In a non-IO world, I'd simply save the result of the call, but the defining function doesn't actually have access to it, as it doesn't execute until later. And if I save the constructed IO[String], it won't actually help, as that IO would repeat the network call every time it's used. So instead, I try to use Async.memoize, which has the following documentation:
Lazily memoizes f. For every time the returned F[F[A]] is bound, the
effect f will be performed at most once (when the inner F[A] is bound
the first time).
What I expect from memoize is a function that only ever executes once for a given input, AND where the contents of the returned IO are only ever evaluated once; in other words, I expect the resulting IO to act as if it were IO.pure(result), except the first time. But that's not what seems to be happening. Instead, I find that while the called function itself only executes once, the contents of the IO are still evaluated every time - exactly as would occur if I tried to naively save and reuse the IO.
I constructed an example to show the problem:
def plus1(num: Int): IO[Int] = {
println("foo")
IO(println("bar")) *> IO(num + 1)
}
var fooMap = Map[Int, IO[IO[Int]]]()
def mplus1(num: Int): IO[Int] = {
val check = fooMap.get(num)
val res = check.getOrElse {
val plus = Async.memoize(plus1(num))
fooMap = fooMap + ((num, plus))
plus
}
res.flatten
}
println("start")
val call1 = mplus1(2)
val call2 = mplus1(2)
val result = (call1 *> call2).unsafeRunSync()
println(result)
println(fooMap.toString)
println("finish")
The output of this program is:
start
foo
bar
bar
3
Map(2 -> <function1>)
finish
Although the plus1 function itself only executes once (one "foo" printed), the output "bar" contained within the IO is printed twice, when I expect it to also print only once. (I have also tried flattening the IO returned by Async.memoize before storing it in the map, but that doesn't do much).
Consider following examples
Given the following helper methods
def plus1(num: Int): IO[IO[Int]] = {
IO(IO(println("plus1")) *> IO(num + 1))
}
def mPlus1(num: Int): IO[IO[Int]] = {
Async.memoize(plus1(num).flatten)
}
Let's build a program that evaluates plus1(1) twice.
val program1 = for {
io <- plus1(1)
_ <- io
_ <- io
} yield {}
program1.unsafeRunSync()
This produces the expected output of printing plus1 twice.
If you do the same but instead using the mPlus1 method
val program2 = for {
io <- mPlus1(1)
_ <- io
_ <- io
} yield {}
program2.unsafeRunSync()
It will print plus1 just once confirming that memoization is working.
The trick with the memoization is that it should be evaluated only once to have the desired effect. Consider now the following program that highlights it.
val memIo = mPlus1(1)
val program3 = for {
io1 <- memIo
io2 <- memIo
_ <- io1
_ <- io2
} yield {}
program3.unsafeRunSync()
And it outputs plus1 twice as io1 and io2 are memoized separately.
As for your example, the foo is printed once because you're using a map and update the value when it's not found and this happens only once. The bar is printed every time when IO is evaluated as you lose the memoization effect by calling res.flatten.

Scala Futures - flatMap and onFailure

If I have some computation that takes a while I might place it in a scala.concurrent.Future:
val f = Future { someLongRunningFunction() }
and let's say I want to do something else asynchronously once that computation is completed:
f.flatMap{ _ => anotherLongRunningFunction() }
In the event that f's initial block fails, how do I "idiomatically" handle this when using flatMap or other combinators? Is it merely a case of using recover or onFailure before the flatMap?
I like the elegance and simplicity of using a flatMap but it seems failure scenarios get in the way of it.
Edit: the second future is reliant on the first, hence the flatMap. I'm looking for a solution that'll elegantly let me chain like I would with flatMap but also handle failures of the first.
To quote the scaladoc for flatMap:
Creates a new future by applying a function to the successful result
of this future, and returns the result of the function as the new
future. If this future is completed with an exception then the new
future will also contain this exception.
Notice the bold, meaning that whatever you pass to flatMap will only be executed if the initial future completes successfully.
So you should only handle the result of the entire execution:
val result = future1.flatMap {
result => functionReturningFuture2(result)
}
and then:
result.onFailure // or
result.onSuccess // or
result.recover
If you have several futures you can put them in a for comprehension.
val futureResult = for {
result1 <- future1
result2 <- future2
...
} yield {
//computation with results
}
You can add a recover at the end in case you want to process any exception you may find:
futureResult.recover{
case exceptionResult: Throwable => // Process exception
}
I think this is more clean that using flatMap.
I'm only starting with Future, but here are some ideas.
If you really want to use flatMap, you have to turn the failure into a success.
for{ a <- f recover r
b <- another(a)
} yield b
This works if the return type of r is :> the result type of f.
Or you can pass the problem of what to do with the failure on to the next process
for{ a <- f map (x => Success(x)) recover (ex => Failure(ex))
b <- another(a)
} yield b
Here the argument type of another would be Try[T] where the type of f is Future[T].

Is blocking within a future still blocking?

Blocking bad, async good, but is blocking within a future still blocking? This is something that I keep coming back to; consider following pseudo-code:
def queryName(id:Id):Future[String]
def queryEveryonesNames:Future[Seq[String]] = {
val everyonesIds:Future[Seq[Id]] = getIds
val everyonesNames:Future[Seq[Future[String]]] = {
everyonesIds.map(seq.map(id=>queryName(id)))
}
// I'm trying to understand the impact of what I'll do below
everyonesNames.map(seq=>seq.map(fut=>blocking(fut, 1 s)))
}
queryEveryonesNames
In the last line I turned Future[Seq[Future[String]]] (notice future within future) into Future[Seq[String]] by blocking on the inner future.
Blocking on a future within a future feels redundant, at least here, yet having a future within a future feels redundant as well.
Can you propose a smarter way of getting rid of the inner future?
Do you think blocking on a future inside a future is bad? If so why and under what circumstances?
Yes, future blocking is blocking, you should avoid that, as resources will be blocked to wait for a result, even if they are in another thread.
If I understood correctly, your question is how to convert Future[Seq[Future[String]]] into Future[Seq[String]] in a non-blocking way.
You can do that with for-comprehensions:
val in = Future[Seq[Future[String]]]
val m = for( a <- in ) // a is Seq[Future[String]]
yield ( Future.sequence(a)) // yields m = Future[Future[Seq[String]]]
val result = for(a <- m; b <- a) yield (b) // yields Future[Seq[String]]
EDIT:
Or just:
val result = in.flatMap(a => Future.sequence(a))