Wait for future to end before printing a variable - scala

In this code I need to print variable seq, but since it's printed before the futures are processed it is printed empty. How to wait for variable seq to be populated before the statement println(seq) is executed?
object TestFutures5 extends App {
def future (i:Int) = Future { i * 10 }
val seq = Seq[Int]()
for ( x <- 1 to 10 ) {
val future2 = future(x)
future2.map { y =>
println(y)
seq :+ y
}
}
println(seq) // <-- this always prints List()
Thread.sleep(5000)
}

The print statement must be executed after all the futures completed, which means that you need to store a reference to each created future. Your sequence is also immutable so you can not add elements to it. If you want to do this in without mutating variables your loop should be refactored like this:
val futureResult = (1 to 10).map {
x =>
future(x)
}
Then simply use Future.sequence to group the futures and do the print:
Future.sequence(futureResult).map(res => println(res))

Related

Difference between { zip map } and { flatMap map } in Future of Scala

I'm reading 《hands on scala》, and one of its exercise is parallelizing merge sort.
I want to know why for-comprehension, which can be translated into flatMap and map, takes more time than zip and map.
my code:
def mergeSortParallel0[T: Ordering](items: IndexedSeq[T]): Future[IndexedSeq[T]] = {
if (items.length <= 16) Future.successful(mergeSortSequential(items))
else {
val (left, right) = items.splitAt(items.length / 2)
for (
l <- mergeSortParallel0(left);
r <- mergeSortParallel0(right)
) yield merge(l, r)
}
}
the standard answer provided by book:
def mergeSortParallel0[T: Ordering](items: IndexedSeq[T]): Future[IndexedSeq[T]] = {
if (items.length <= 16) Future.successful(mergeSortSequential(items))
else {
val (left, right) = items.splitAt(items.length / 2)
mergeSortParallel0(left).zip(mergeSortParallel0(right)).map{
case (sortedLeft, sortedRight) => merge(sortedLeft, sortedRight)
}
}
}
flatMap or map are sequential operations on Scala Future and on their own have nothing to do with running things in parallel. They can be viewed as simple callbacks executed when a Future completes. Or in other words, provided code inside map(...) or flatMap(...) will start to execute only when the previous Future is finished.
zip on the other hand will run your Futures in parallel and return the result as a Tuple when both of them are complete. Similarly, you could use zipWith which takes a function to transform the results of two Futures (combines zip and map operations):
mergeSortParallel0(left).zipWith(mergeSortParallel0(right)){
case (sortedLeft, sortedRight) => merge(sortedLeft, sortedRight)
}
Another way to achieve parallelism is to declare Futures outside for-comprehension. This works as Futures in Scala are 'eager' and they start as soon as you declare them (assign to val):
def mergeSortParallel0[T: Ordering](items: IndexedSeq[T]): Future[IndexedSeq[T]] = {
if (items.length <= 16) Future.successful(mergeSortSequential(items))
else {
val (left, right) = items.splitAt(items.length / 2)
val leftF = mergeSortParallel0(left)
val rightF = mergeSortParallel0(right)
for {
sortedLeft <- leftF
sortedRight <- rightF
} yield {
merge(sortedLeft, sortedRight)
}
}
}

Have access in Future.sequence to object that contains Future

I have the following function in Scala:
object TestFutures5 extends App {
def future (i:Int) = Future { i * 10 }
var rnd = scala.util.Random
val futureResult = (1 to 10).map {
x =>
val y = rnd.nextInt(x)
(future(x),y) // <- this should return a future
// as it needs to be processed by Future.sequence
}
Future.sequence(futureResult).foreach(list => println(list)) // <- this does not compile
Thread.sleep(5000)
}
In Future.sequence function I need to have access to the result of future(x) and to each variable y, but since sequence works only with futures this code does not compile. How to refactor/fix it?
You could just add y to the result of the future:
future(x).map(res => (res, y))
Your sequence will now contain a list of tuples with the result and the value of y
use traverse
Future.traverse(futureResult)(pair => pair._1.map(result => (pair._2, result)))
map each future so that the result will have y value.
Future.traverse(futureResult)(pair => pair._1.map(result => (pair._2, Right(result))).recover { case th => (pair._2, Left(th)) })
To get the y value in case of failure use recover.
So, to summarize it. You are using y as a tag or index for your future computation so that you can know what future with tag failed.
y works as a tag or name to the future.
Scala REPL
scala> Future.traverse(futureResult)(pair => pair._1.map(result => (pair._2, Right(result))).recover { case th => (pair._2, Left(th)) })
result: scala.concurrent.Future[scala.collection.immutable.IndexedSeq[(Int, scala.util.Either[Throwable,Int])]] = Future(<not completed>)

Cats Writer Vector is empty

I wrote this simple program in my attempt to learn how Cats Writer works
import cats.data.Writer
import cats.syntax.applicative._
import cats.syntax.writer._
import cats.instances.vector._
object WriterTest extends App {
type Logged2[A] = Writer[Vector[String], A]
Vector("started the program").tell
val output1 = calculate1(10)
val foo = new Foo()
val output2 = foo.calculate2(20)
val (log, sum) = (output1 + output2).pure[Logged2].run
println(log)
println(sum)
def calculate1(x : Int) : Int = {
Vector("came inside calculate1").tell
val output = 10 + x
Vector(s"Calculated value ${output}").tell
output
}
}
class Foo {
def calculate2(x: Int) : Int = {
Vector("came inside calculate 2").tell
val output = 10 + x
Vector(s"calculated ${output}").tell
output
}
}
The program works and the output is
> run-main WriterTest
[info] Compiling 1 Scala source to /Users/Cats/target/scala-2.11/classes...
[info] Running WriterTest
Vector()
50
[success] Total time: 1 s, completed Jan 21, 2017 8:14:19 AM
But why is the vector empty? Shouldn't it contain all the strings on which I used the "tell" method?
When you call tell on your Vectors, each time you create a Writer[Vector[String], Unit]. However, you never actually do anything with your Writers, you just discard them. Further, you call pure to create your final Writer, which simply creates a Writer with an empty Vector. You have to combine the writers together in a chain that carries your value and message around.
type Logged[A] = Writer[Vector[String], A]
val (log, sum) = (for {
_ <- Vector("started the program").tell
output1 <- calculate1(10)
foo = new Foo()
output2 <- foo.calculate2(20)
} yield output1 + output2).run
def calculate1(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate1").tell
output = 10 + x
_ <- Vector(s"Calculated value ${output}").tell
} yield output
class Foo {
def calculate2(x: Int): Logged[Int] = for {
_ <- Vector("came inside calculate2").tell
output = 10 + x
_ <- Vector(s"calculated ${output}").tell
} yield output
}
Note the use of for notation. The definition of calculate1 is really
def calculate1(x: Int): Logged[Int] = Vector("came inside calculate1").tell.flatMap { _ =>
val output = 10 + x
Vector(s"calculated ${output}").tell.map { _ => output }
}
flatMap is the monadic bind operation, which means it understands how to take two monadic values (in this case Writer) and join them together to get a new one. In this case, it makes a Writer containing the concatenation of the logs and the value of the one on the right.
Note how there are no side effects. There is no global state by which Writer can remember all your calls to tell. You instead make many Writers and join them together with flatMap to get one big one at the end.
The problem with your example code is that you're not using the result of the tell method.
If you take a look at its signature, you'll see this:
final class WriterIdSyntax[A](val a: A) extends AnyVal {
def tell: Writer[A, Unit] = Writer(a, ())
}
it is clear that tell returns a Writer[A, Unit] result which is immediately discarded because you didn't assign it to a value.
The proper way to use a Writer (and any monad in Scala) is through its flatMap method. It would look similar to this:
println(
Vector("started the program").tell.flatMap { _ =>
15.pure[Logged2].flatMap { i =>
Writer(Vector("ended program"), i)
}
}
)
The code above, when executed will give you this:
WriterT((Vector(started the program, ended program),15))
As you can see, both messages and the int are stored in the result.
Now this is a bit ugly, and Scala actually provides a better way to do this: for-comprehensions. For-comprehension are a bit of syntactic sugar that allows us to write the same code in this way:
println(
for {
_ <- Vector("started the program").tell
i <- 15.pure[Logged2]
_ <- Vector("ended program").tell
} yield i
)
Now going back to your example, what I would recommend is for you to change the return type of compute1 and compute2 to be Writer[Vector[String], Int] and then try to make your application compile using what I wrote above.

Dynamically create for comprehension of Futures and wait for completion

I have the following code:
// Start async functions
val async1: Future[Seq[Int]] = ...
val async2: Future[Seq[Int]] = ...
val async3: Future[Seq[Int]] = ...
// Wait for completion
(for {
a1 <- async1
a2 <- async2
a3 <- async3
} yield (a1, a2, a3)).map {
// Use the results
}
I want to improve this to handle a variable amount of async functions (and not necessarily calling each of them every time). What I have done so far is:
// Start the async functions ?
val asyncs: Seq[Future[Seq[Int]] = otherList.filter(x => someCondition).map(x => asyncFunc(x))
// Wait for the functions to finish ?
(for (seqInt <- asyncs) yield seqInt).map {
case results => // <-- problem here
// Use the results
}
The problem I am having is that the results are of type Future[Seq[Int]], but I expected they would be of type (Seq[Int], Seq[Int], Seq[Int]) like in the first snippet.
In the end I would like to do is kickoff a dynamic amount of async functions which all have the same Future return type, wait for them all to finish, then use all of their results together.
Future.sequence is the key part I was missing (thanks for the comment)
// Create a list of Futures
val asyncs: Seq[Future[Seq[Int]] = otherList.filter(x => someCondition).map(x => asyncFunc(x))
// Use Future.sequence to to execute them and return a list of sequence of integers
Future.sequence(asyncs).map{
case results => // Use the results List[Seq[Int]]
}.recover {
case error => // Oh no!
}

Create Future without starting it

This is a follow-up to my previous question
Suppose I want to create a future with my function but don't want to start it immediately (i.e. I do not want to call val f = Future { ... // my function}.
Now I see it can be done as follows:
val p = promise[Unit]
val f = p.future map { _ => // my function here }
Is it the only way to create a future with my function w/o executing it?
You can do something like this
val p = Promise[Unit]()
val f = p.future
//... some code run at a later time
p.success {
// your function
}
LATER EDIT:
I think the pattern you're looking for can be encapsulated like this:
class LatentComputation[T](f: => T) {
private val p = Promise[T]()
def trigger() { p.success(f) }
def future: Future[T] = p.future
}
object LatentComputation {
def apply[T](f: => T) = new LatentComputation(f)
}
You would use it like this:
val comp = LatentComputation {
// your code to be executed later
}
val f = comp.future
// somewhere else in the code
comp.trigger()
You could always defer creation with a closure, you'll not get the future object right ahead, but you get a handle to call later.
type DeferredComputation[T,R] = T => Future[R]
def deferredCall[T,R](futureBody: T => R): DeferredComputation[T,R] =
t => future {futureBody(t)}
def deferredResult[R](futureBody: => R): DeferredComputation[Unit,R] =
_ => future {futureBody}
If you are getting too fancy with execution control, maybe you should be using actors instead?
Or, perhaps, you should be using a Promise instead of a Future: a Promise can be passed on to others, while you keep it to "fulfill" it at a later time.
It's also worth giving a plug to Promise.completeWith.
You already know how to use p.future onComplete mystuff.
You can trigger that from another future using p completeWith f.
You can also define a function that creates and returns the Future, and then call it:
val double = (value: Int) => {
val f = Future { Thread.sleep(1000); value * 2 }
f.onComplete(x => println(s"Future return: $x"))
f
}
println("Before future.")
double(2)
println("After future is called, but as the future takes 1 sec to run, it will be printed before.")
I used this to executes futures in batches of n, something like:
// The functions that returns the future.
val double = (i: Int) => {
val future = Future ({
println(s"Start task $i")
Thread.sleep(1000)
i * 2
})
future.onComplete(_ => {
println(s"Task $i ended")
})
future
}
val numbers = 1 to 20
numbers
.map(i => (i, double))
.grouped(5)
.foreach(batch => {
val result = Await.result( Future.sequence(batch.map{ case (i, callback) => callback(i) }), 5.minutes )
println(result)
})
Or just use regular methods that return futures, and fire them in series using something like a for comprehension (sequential call-site evaluation)
This well known problem with standard libraries Future: they are designed in such a way that they are not referentially transparent, since they evaluate eagerly and memoize their result. In most use cases, this is totally fine and Scala developers rarely need to create non-evaluated future.
Take the following program:
val x = Future(...); f(x, x)
is not the same program as
f(Future(...), Future(...))
because in the first case the future is evaluated once, in the second case it is evaluated twice.
The are libraries which provide the necessary abstractions to work with referentially transparent asynchronous tasks, whose evaluation is deferred and not memoized unless explicitly required by the developer.
Scalaz Task
Monix Task
fs2
If you are looking to use Cats, Cats effects works nicely with both Monix and fs2.
this is a bit of a hack, since it have nothing to do with how future works but just adding lazy would suffice:
lazy val f = Future { ... // my function}
but note that this is sort of a type change as well, because whenever you reference it you will need to declare the reference as lazy too or it will be executed.