For loop containing Scala Futures modifying a List - scala

Let's say I have a ListBuffer[Int] and I iterate it with a foreach loop, and each loop will modify this list from inside a Future (removing the current element), and will do something special when the list is empty. Example code:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.collection.mutable.ListBuffer
val l = ListBuffer(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
l.foreach(n => Future {
println(s"Processing $n")
Future {
l -= n
println(s"Removed $n")
if (l.isEmpty) println("List is empty!")
}
})
This is probably going to end very badly. I have a more complex code with similar structure and same needs, but I do not know how to structure it so I can achieve same functionality in a more reliable way.

The way you present your problem is really not in the functional paradigm that scala is intended for.
What you seem to want, is to do a list of asynchronous computations, do something at the end of each one, and something else when every one is finished. This is pretty simple if you use continuations, which are simple to implement with map and flatMap methods on Future.
val fa: Future[Int] = Future { 1 }
// will apply the function to the result when it becomes available
val fb: Future[Int] = fa.map(a => a + 1)
// will start the asynchronous computation using the result when it will become available
val fc: Future[Int] = fa.flatMap(a => Future { a + 2 })
Once you have all this, you can easily do something when each of your Future completes (successfully):
val myFutures: List[Future[Int]] = ???
myFutures.map(futInt => futInt.map(int => int + 2))
Here, I will add 2 to each value I get from the different asynchronous computations in the List.
You can also choose to wait for all the Futures in your list to complete by using Future.sequence:
val myFutureList: Future[List[Int]] = Future.sequence(myFutures)
Once again, you get a Future, which will be resolved when each of the Futures inside the input list are successfully resolved, or will fail whenever one of your Futures fails. You'll then be able to use map or flatMap on this new Future, to use all the computed values at once.
So here's how I would write the code you proposed:
val l = 1 to 10
val processings: Seq[Future[Unit]] = l.map {n =>
Future(println(s"processing $n")).map {_ =>
println(s"finished processing $n")
}
}
val processingOver: Future[Unit] =
Future.sequence(processings).map { (lu: Seq[Unit]) =>
println(s"Finished processing ${lu.size} elements")
}
Of course, I would recommend having real functions rather than procedures (returning Unit), so that you can have values to do something with. I used println to have a code which will produce the same output as yours (except for the prints, which have a slightly different meaning, since we are not mutating anything anymore).

Related

Why is Future considered to be "not referentially transparent"?

So I was reading the "Scala with Cats" book, and there was this sentence which I'm going to quote down here:
Note that Scala’s Futures aren’t a great example of pure functional programming because they aren’t referentially transparent.
And also, an example is provided as follows:
val future1 = {
// Initialize Random with a fixed seed:
val r = new Random(0L)
// nextInt has the side-effect of moving to
// the next random number in the sequence:
val x = Future(r.nextInt)
for {
a <- x
b <- x
} yield (a, b)
}
val future2 = {
val r = new Random(0L)
for {
a <- Future(r.nextInt)
b <- Future(r.nextInt)
} yield (a, b)
}
val result1 = Await.result(future1, 1.second)
// result1: (Int, Int) = (-1155484576, -1155484576)
val result2 = Await.result(future2, 1.second)
// result2: (Int, Int) = (-1155484576, -723955400)
I mean, I think it's because of the fact that r.nextInt is never referentially transparent, right? since identity(r.nextInt) would never be equal to identity(r.nextInt), does this mean that identity is not referentially transparent either? (or Identity monad, to have better comparisons with Future). If the expression being calculated is RT, then the Future would also be RT:
def foo(): Int = 42
val x = Future(foo())
Await.result(x, ...) == Await.result(Future(foo()), ...) // true
So as far as I can reason about the example, almost every function and Monad type should be non-RT. Or is there something special about Future? I also read this question and its answers, yet couldn't find what I was looking for.
You are actually right and you are touching one of the pickiest points of FP; at least in Scala.
Technically speaking, Future on its own is RT. The important thing is that different to IO it can't wrap non-RT things into an RT description. However, you can say the same of many other types like List, or Option; so why folks don't make a fuss about it?
Well, as with many things, the devil is in the details.
Contrary to List or Option, Future is typically used with non-RT things; e.g. an HTTP request or a database query. Thus, the emphasis folks give in showing that Future can't guarantee RT in those situations.
More importantly, there is only one reason to introduce Future on a codebase, concurrency (not to be confused with parallelism); otherwise, it would be the same as Try. Thus, controlling when and how those are executed is usually important.
Which is the reason why cats recommends the use of IO for all use cases of Future
Note: You can find a similar discussion on this cats PR and its linked discussions: https://github.com/typelevel/cats/pull/4182
So... the referential transparency simply means that you should be able to replace the reference with the actual thing (and vice versa) without changing the overall symatics or behaviour. Like mathematics is.
So, lets say you have x = 4 and y = 5, then x + y, 4 + y, x + 5, and 4 + 5 are pretty much the same thing. And can be replaced with each otherwhenever you want.
But... just look at following two things...
val f1 = Future { println("Hi") }
val f2 = f1
val f1 = Future { println("Hi") }
val f2 = Future { println("Hi") }
You can try to run it. The behaviour of these two programs is not going to be the same.
Scala Future are eagerly evaluated... which means that there is no way to actually write Future { println("Hi") } in your code without executing it as a seperate behaviour.
Keep in mind that this is not just linked to having side effects. Yes, the example which I used here with println was a side effect, but that was just to make the behaviour difference obvious to notice.
Even if you use something to suspend the side effect inside the Future, you will endup with two suspended side effects instead of one. And once these suspended side effects are passed to the interpreater, the same action will happen twice.
In following example, even if we suspend the print side-effect by wrapping it up in an IO, the expansive evaluation part of the program can still cause different behavours even if everything in the universe is exactly same for two cases.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
// cpu bound
// takes around 80 miliseconds
// we have only 1 core
def veryExpensiveComputation(input: Int): Int = ???
def impl1(): Unit = {
val f1 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f2 = f1
val f3 = f1
val futures = Future.sequence(Seq(f1, f2, f3))
val ios = Await.result(futures, 100 milli)
}
def impl2(): Unit = {
val f1 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f2 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val f3 = Future {
val result = veryExpensiveComputation(10)
IO {
println(result)
result
}
}
val futures = Future.sequence(Seq(f1, f2, f3))
val ios = Await.result(futures, 100 milli)
}
The first impl will cause only 1 expensive computation, but the second will trigger 3 expensive computations. And thus the program will fail with timeout in the second example.
If properly written with IO or ZIO (without Future), it with fail with timeout in both implementations.

Conditional chain of futures

I have a sequence of parameters. For each parameter I have to perform DB query, which may or may not return a result. Simply speaking, I need to stop after the first result is non-empty. Of course, I would like to avoid doing unnecessary calls. The caveat is - I need to have this operation(s) contained as a another Future - or any "most reactive" approach.
Speaking of code:
//that what I have
def dbQuery(p:Param): Future[Option[Result]] = {}
//my list of params
val input = Seq(p1,p2,p3)
//that what I need to implements
def getFirstNonEmpty(params:Seq[Param]): Future[Option[Result]]
I know I can possibly just wrap entire function in yet another Future and execute code sequentially (Await? Brrr...), but that not the cleanest solution.
Can I somehow create lazy initialized collection of futures, like
params.map ( p => FutureWhichWontStartUnlessAskedWhichWrapsOtherFuture { dbQuery(p) }).findFirst(!_.isEmpty())
I believe it's possible!
What do you think about something like this?
def getFirstNonEmpty(params: Seq[Param]): Future[Option[Result]] = {
params.foldLeft(Future.successful(Option.empty[Result])) { (accuFtrOpt, param) =>
accuFtrOpt.flatMap {
case None => dbQuery(param)
case result => Future.successful(result)
}
}
}
This might be overkill, but if you are open to using scalaz we can do this using OptionT and foldMap.
With OptionT we sort of combine Future and Option into one structure. We can get the first of two Futures with a non-empty result using OptionT.orElse.
import scalaz._, Scalaz._
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
val someF: Future[Option[Int]] = Future.successful(Some(1))
val noneF: Future[Option[Int]] = Future.successful(None)
val first = OptionT(noneF) orElse OptionT(someF)
first.run // Future[Option[Int]] = Success(Some(1))
We could now get the first non-empty Future from a List with reduce from the standard library (this will however run all the Futures) :
List(noneF, noneF, someF).map(OptionT.apply).reduce(_ orElse _).run
But with a List (or other collection) we can't be sure that there is at least one element, so we need to use fold and pass a start value. Scalaz can do this work for us by using a Monoid. The Monoid[OptionT[Future, Int]] we will use will supply the start value and combine the Futures with the orElse used above.
type Param = Int
type Result = Int
type FutureO[x] = OptionT[Future, x]
def query(p: Param): Future[Option[Result]] =
Future.successful{ println(p); if (p > 2) Some(p) else None }
def getFirstNonEmpty(params: List[Param]): Future[Option[Result]] = {
implicit val monoid = PlusEmpty[FutureO].monoid[Result]
params.foldMap(p => OptionT(query(p))).run
}
val result = getFirstNonEmpty(List(1,2,3,4))
// prints 1, 2, 3
result.foreach(println) // Some(3)
This is an old question, but if someone comes looking for an answer, here is my take. I solved it for a use case that required me to loop through a limited number of futures sequentially and stop when the first of them returned a result.
I did not need a library for my use-case, a light-weight combination of recursion and pattern matching was sufficient. Although the question here does not have the same problem as a sequence of futures, looping through a sequence of parameters would be similar.
Here would be the pseudo-code based on recursion.
I have not compiled this, fix the types being matched/returned.
def getFirstNonEmpty(params: Seq[Param]): Future[Option[Result]] = {
if (params.isEmpty) {
Future.successful(None)
} else {
val head = params.head
dbQuery(head) match {
case Some(v) => Future.successful(Some(v))
case None => getFirstNonEmpty(params.tail)
}
}
}

Scala - Execute arbitrary number of Futures sequentially but dependently [duplicate]

This question already has answers here:
Is there sequential Future.find?
(3 answers)
Closed 8 years ago.
I'm trying to figure out the neatest way to execute a series of Futures in sequence, where one Future's execution depends on the previous. I'm trying to do this for an arbitrary number of futures.
User case:
I have retrieved a number of Ids from my database.
I now need to retrieve some related data on a web service.
I want to stop once I've found a valid result.
I only care about the result that succeeded.
Executing these all in parallel and then parsing the collection of results returned isn't an option. I have to do one request at a time, and only execute the next request if the previous request returned no results.
The current solution is along these lines. Using foldLeft to execute the requests and then only evaluating the next future if the previous future meets some condition.
def dblFuture(i: Int) = { i * 2 }
val list = List(1,2,3,4,5)
val future = list.foldLeft(Future(0)) {
(previousFuture, next) => {
for {
previousResult <- previousFuture
nextFuture <- { if (previousResult <= 4) dblFuture(next) else previousFuture }
} yield (nextFuture)
}
}
The big downside of this is a) I keep processing all items even once i've got a result i'm happy with and b) once I've found the result I'm after, I keep evaluating the predicate. In this case it's a simple if, but in reality it could be more complicated.
I feel like I'm missing a far more elegant solution to this.
Looking at your example, it seems as though the previous result has no bearing on subsequent results, and instead what only matters is that the previous result satisfies some condition to prevent the next result from being computed. If that is the case, here is a recursive solution using filter and recoverWith.
def untilFirstSuccess[A, B](f: A => Future[B])(condition: B => Boolean)(list: List[A]): Future[B] = {
list match {
case head :: tail => f(head).filter(condition).recoverWith { case _: Throwable => untilFirstSuccess(f)(condition)(tail) }
case Nil => Future.failed(new Exception("All failed.."))
}
}
filter will only be called when the Future has completed, and recoverWith will only be called if the Future has failed.
def dblFuture(i: Int): Future[Int] = Future {
println("Executing.. " + i)
i * 2
}
val list = List(1, 2, 3, 4, 5)
scala> untilFirstSuccess(dblFuture)(_ > 6)(list)
Executing.. 1
Executing.. 2
Executing.. 3
Executing.. 4
res1: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise#514f4e98
scala> res1.value
res2: Option[scala.util.Try[Int]] = Some(Success(8))
Neatest way, and "true functional programming" is scalaz-stream ;) However you'll need to switch to scalaz.concurrent.Task from scala Future for abstraction for "future result". It's a bit different. Task is pure, and Future is "running computation", but they have a lot in common.
import scalaz.concurrent.Task
import scalaz.stream.Process
def dblTask(i: Int) = Task {
println(s"Executing task $i")
i * 2
}
val list = Seq(1,2,3,4,5)
val p: Process[Task, Int] = Process.emitAll(list)
val result: Task[Option[Int]] =
p.flatMap(i => Process.eval(dblTask(i))).takeWhile(_ < 10).runLast
println(s"result = ${result.run}")
Result:
Executing task 1
Executing task 2
Executing task 3
Executing task 4
Executing task 5
result = Some(8)
if your computation is already scala Future, you can transform it to Task
implicit class Transformer[+T](fut: => SFuture[T]) {
def toTask(implicit ec: scala.concurrent.ExecutionContext): Task[T] = {
import scala.util.{Failure, Success}
import scalaz.syntax.either._
Task.async {
register =>
fut.onComplete {
case Success(v) => register(v.right)
case Failure(ex) => register(ex.left)
}
}
}
}

How do you stop building an Option[Collection] upon reaching the first None?

When building up a collection inside an Option, each attempt to make the next member of the collection might fail, making the collection as a whole a failure, too. Upon the first failure to make a member, I'd like to give up immediately and return None for the whole collection. What is an idiomatic way to do this in Scala?
Here's one approach I've come up with:
def findPartByName(name: String): Option[Part] = . . .
def allParts(names: Seq[String]): Option[Seq[Part]] =
names.foldLeft(Some(Seq.empty): Option[Seq[Part]]) {
(result, name) => result match {
case Some(parts) =>
findPartByName(name) flatMap { part => Some(parts :+ part) }
case None => None
}
}
In other words, if any call to findPartByName returns None, allParts returns None. Otherwise, allParts returns a Some containing a collection of Parts, all of which are guaranteed to be valid. An empty collection is OK.
The above has the advantage that it stops calling findPartByName after the first failure. But the foldLeft still iterates once for each name, regardless.
Here's a version that bails out as soon as findPartByName returns a None:
def allParts2(names: Seq[String]): Option[Seq[Part]] = Some(
for (name <- names) yield findPartByName(name) match {
case Some(part) => part
case None => return None
}
)
I currently find the second version more readable, but (a) what seems most readable is likely to change as I get more experience with Scala, (b) I get the impression that early return is frowned upon in Scala, and (c) neither one seems to make what's going on especially obvious to me.
The combination of "all-or-nothing" and "give up on the first failure" seems like such a basic programming concept, I figure there must be a common Scala or functional idiom to express it.
The return in your code is actually a couple levels deep in anonymous functions. As a result, it must be implemented by throwing an exception which is caught in the outer function. This isn't efficient or pretty, hence the frowning.
It is easiest and most efficient to write this with a while loop and an Iterator.
def allParts3(names: Seq[String]): Option[Seq[Part]] = {
val iterator = names.iterator
var accum = List.empty[Part]
while (iterator.hasNext) {
findPartByName(iterator.next) match {
case Some(part) => accum +:= part
case None => return None
}
}
Some(accum.reverse)
}
Because we don't know what kind of Seq names is, we must create an iterator to loop over it efficiently rather than using tail or indexes. The while loop can be replaced with a tail-recursive inner function, but with the iterator a while loop is clearer.
Scala collections have some options to use laziness to achieve that.
You can use view and takeWhile:
def allPartsWithView(names: Seq[String]): Option[Seq[Part]] = {
val successes = names.view.map(findPartByName)
.takeWhile(!_.isEmpty)
.map(_.get)
.force
if (!names.isDefinedAt(successes.size)) Some(successes)
else None
}
Using ifDefinedAt avoids potentially traversing a long input names in the case of an early failure.
You could also use toStream and span to achieve the same thing:
def allPartsWithStream(names: Seq[String]): Option[Seq[Part]] = {
val (good, bad) = names.toStream.map(findPartByName)
.span(!_.isEmpty)
if (bad.isEmpty) Some(good.map(_.get).toList)
else None
}
I've found trying to mix view and span causes findPartByName to be evaluated twice per item in case of success.
The whole idea of returning an error condition if any error occurs does, however, sound more like a job ("the" job?) for throwing and catching exceptions. I suppose it depends on the context in your program.
Combining the other answers, i.e., a mutable flag with the map and takeWhile we love.
Given an infinite stream:
scala> var count = 0
count: Int = 0
scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)
Take until a predicate fails:
scala> var failed = false
failed: Boolean = false
scala> vs map { case x if x < 5 => println(s"Yup $x"); Some(x) case x => println(s"Nope $x"); failed = true; None } takeWhile (_.nonEmpty) map (_.get)
Yup 1
res0: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res1: List[Int] = List(1, 2, 3, 4)
or more simply:
scala> var count = 0
count: Int = 0
scala> val vs = Stream continually { println(s"Compute $count") ; count += 1 ; count }
Compute 0
vs: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> var failed = false
failed: Boolean = false
scala> vs map { case x if x < 5 => println(s"Yup $x"); x case x => println(s"Nope $x"); failed = true; -1 } takeWhile (_ => !failed)
Yup 1
res3: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> .toList
Compute 1
Yup 2
Compute 2
Yup 3
Compute 3
Yup 4
Compute 4
Nope 5
res4: List[Int] = List(1, 2, 3, 4)
I think your allParts2 function has a problem as one of the two branches of your match statement will perform a side effect. The return statement is the not-idiomatic bit, behaving as if you are doing an imperative jump.
The first function looks better, but if you are concerned with the sub-optimal iteration that foldLeft could produce you should probably go for a recursive solution as the following:
def allParts(names: Seq[String]): Option[Seq[Part]] = {
#tailrec
def allPartsRec(names: Seq[String], acc: Seq[String]): Option[Seq[String]] = names match {
case Seq(x, xs#_*) => findPartByName(x) match {
case Some(part) => allPartsRec(xs, acc +: part)
case None => None
}
case _ => Some(acc)
}
allPartsRec(names, Seq.empty)
}
I didn't compile/run it but the idea should be there and I believe it is more idiomatic than using the return trick!
I keep thinking that this has to be a one- or two-liner. I came up with one:
def allParts4(names: Seq[String]): Option[Seq[Part]] = Some(
names.map(findPartByName(_) getOrElse { return None })
)
Advantage:
The intent is extremely clear. There's no clutter and there's no exotic or nonstandard Scala.
Disadvantages:
The early return violates referential transparency, as Aldo Stracquadanio pointed out. You can't put the body of allParts4 into its calling code without changing its meaning.
Possibly inefficient due to the internal throwing and catching of an exception, as wingedsubmariner pointed out.
Sure enough, I put this into some real code, and within ten minutes, I'd enclosed the expression inside something else, and predictably got surprising behavior. So now I understand a little better why early return is frowned upon.
This is such a common operation, so important in code that makes heavy use of Option, and Scala is normally so good at combining things, I can't believe there isn't a pretty natural idiom to do it correctly.
Aren't monads good for specifying how to combine actions? Is there a GiveUpAtTheFirstSignOfResistance monad?

Create Future without starting it

This is a follow-up to my previous question
Suppose I want to create a future with my function but don't want to start it immediately (i.e. I do not want to call val f = Future { ... // my function}.
Now I see it can be done as follows:
val p = promise[Unit]
val f = p.future map { _ => // my function here }
Is it the only way to create a future with my function w/o executing it?
You can do something like this
val p = Promise[Unit]()
val f = p.future
//... some code run at a later time
p.success {
// your function
}
LATER EDIT:
I think the pattern you're looking for can be encapsulated like this:
class LatentComputation[T](f: => T) {
private val p = Promise[T]()
def trigger() { p.success(f) }
def future: Future[T] = p.future
}
object LatentComputation {
def apply[T](f: => T) = new LatentComputation(f)
}
You would use it like this:
val comp = LatentComputation {
// your code to be executed later
}
val f = comp.future
// somewhere else in the code
comp.trigger()
You could always defer creation with a closure, you'll not get the future object right ahead, but you get a handle to call later.
type DeferredComputation[T,R] = T => Future[R]
def deferredCall[T,R](futureBody: T => R): DeferredComputation[T,R] =
t => future {futureBody(t)}
def deferredResult[R](futureBody: => R): DeferredComputation[Unit,R] =
_ => future {futureBody}
If you are getting too fancy with execution control, maybe you should be using actors instead?
Or, perhaps, you should be using a Promise instead of a Future: a Promise can be passed on to others, while you keep it to "fulfill" it at a later time.
It's also worth giving a plug to Promise.completeWith.
You already know how to use p.future onComplete mystuff.
You can trigger that from another future using p completeWith f.
You can also define a function that creates and returns the Future, and then call it:
val double = (value: Int) => {
val f = Future { Thread.sleep(1000); value * 2 }
f.onComplete(x => println(s"Future return: $x"))
f
}
println("Before future.")
double(2)
println("After future is called, but as the future takes 1 sec to run, it will be printed before.")
I used this to executes futures in batches of n, something like:
// The functions that returns the future.
val double = (i: Int) => {
val future = Future ({
println(s"Start task $i")
Thread.sleep(1000)
i * 2
})
future.onComplete(_ => {
println(s"Task $i ended")
})
future
}
val numbers = 1 to 20
numbers
.map(i => (i, double))
.grouped(5)
.foreach(batch => {
val result = Await.result( Future.sequence(batch.map{ case (i, callback) => callback(i) }), 5.minutes )
println(result)
})
Or just use regular methods that return futures, and fire them in series using something like a for comprehension (sequential call-site evaluation)
This well known problem with standard libraries Future: they are designed in such a way that they are not referentially transparent, since they evaluate eagerly and memoize their result. In most use cases, this is totally fine and Scala developers rarely need to create non-evaluated future.
Take the following program:
val x = Future(...); f(x, x)
is not the same program as
f(Future(...), Future(...))
because in the first case the future is evaluated once, in the second case it is evaluated twice.
The are libraries which provide the necessary abstractions to work with referentially transparent asynchronous tasks, whose evaluation is deferred and not memoized unless explicitly required by the developer.
Scalaz Task
Monix Task
fs2
If you are looking to use Cats, Cats effects works nicely with both Monix and fs2.
this is a bit of a hack, since it have nothing to do with how future works but just adding lazy would suffice:
lazy val f = Future { ... // my function}
but note that this is sort of a type change as well, because whenever you reference it you will need to declare the reference as lazy too or it will be executed.