Suppose we have a class which allocates some resources on the heap, and creating it takes some time, so it's hidden behind the future
class Container(val n: Int) {
allocateMemoryOnTheHeap(n)
def cleanup(): ???
}
def prepareContainer(val n: Int): Future[Container] = ???
Our task is to download a list of values and put it inside our container class. If any of the downloads failed we want to return a failure and release the resources.
def downloadNumber(): Future[Int] = ???
def numbersProvider(howMany: Int): Future[Seq[Int]] = (0 until howMany).map(downloadNumber).map(Future.sequence)
def prepareContainersForValuesFromUpstream(): Future[Seq[Container]] = {
val futureNumbers = numbersProvider(42)
val containersFuture: Future[Seq[Future[Container]]] = futureNumbers.map {
numbers =>
numbers.map(prepareContainer(_))
}
containersFuture.flatMap(Future.sequence)
}
def main() = {
val containersFuture = prepareContainersForValuesFromUpstream()
val result = Await.result(containersFuture, timeout)
result match {
case Success(_) => println("Downloaded values and stored in containers for later!")
case Failure(_) => //todo: how to release resources from created containers?
}
}
How can we do it? I think that applying Future.sequence in the prepareContainersForValuesFromUpstream makes it impossible, so we need to address this problem before it. But applying some tricks like this (Scala: List[Future] to Future[List] disregarding failed futures) makes it hard to convert the result to the proper form.
Related
I need to execute a Future method on some elements I have in a list simultaneously. My current implementation works sequentially, which is not optimal for saving time. I did this by mapping my list and calling the method on each element and processing the data this way.
My manager shared a link with me showing how to execute Futures simultaneously using for-comprehension but I cannot see/understand how I can implement this with my List.
The link he shared with me is https://alvinalexander.com/scala/how-use-multiple-scala-futures-in-for-comprehension-loop/
Here is my current code:
private def method1(id: String): Tuple2[Boolean, List[MyObject]] = {
val workers = List.concat(idleWorkers, activeWorkers.keys.toList)
var ready = true;
val workerStatus = workers.map{ worker =>
val option = Await.result(method2(worker), 1 seconds)
var status = if (option.isDefined) {
if (option.get._2 == id) {
option.get._1.toString
} else {
"INVALID"
}
} else "FAILED"
val status = s"$worker: $status"
if (option.get._1) {
ready = false
}
MyObject(worker.toString, status)
}.toList.filterNot(s => s. status.contains("INVALID"))
(ready, workerStatus)
}
private def method2(worker: ActorRef): Future[Option[(Boolean, String)]] = Future{
implicit val timeout: Timeout = 1 seconds;
Try(Await.result(worker ? GetStatus, 1 seconds)) match {
case Success(extractedVal) => extractedVal match {
case res: (Boolean, String) => Some(res)
case _ => None
}
case Failure(_) => { None }
case _ => { None }
}
}
If someone could suggest how to implement for-comprehension in this scenario, I would be grateful. Thanks
For method2 there is no need for the Future/Await mix. Just map the Future:
def method2(worker: ActorRef): Future[Option[(Boolean, String)]] =
(worker ? GetStatus).map{
case res: (Boolean, String) => Some(res)
case _ => None
}
For method1 you likewise need to map the result of method2 and do the processing inside the map. This will make workerStatus a List[Future[MyObject]] and means that everything runs in parallel.
Then use Future.sequence(workerStatus) to turn the List[Future[MyObject]] into a Future[List[MyObject]]. You can then use map again to do the filtering/ checking on that List[MyObject]. This will happen when all the individual Futures have completed.
Ideally you would then return a Future from method1 to keep everything asynchronous. You could, if absolutely necessary, use Await.result at this point which would wait for all the asynchronous operations to complete (or fail).
I have a program which consumes an infinite stream of data. Along the way I'd like to record some metrics, which form a monoid since they're just simple sums and averages. Periodically, I want to write out these metrics somewhere, clear them, and return to accumulating them. I have essentially:
object Foo {
type MetricsIO[A] = StateT[IO, MetricData, A]
def recordMetric(m: MetricData): MetricsIO[Unit] = {
StateT.modify(_.combine(m))
}
def sendMetrics: MetricsIO[Unit] = {
StateT.modifyF { s =>
val write: IO[Unit] = writeMetrics(s)
write.attempt.map {
case Left(_) => s
case Right(_) => Monoid[MetricData].empty
}
}
}
}
So most of the execution uses IO directly and lifts using StateT.liftF. And in certain situations, I include some calls to recordMetric. At the end of it I've got a stream:
val mainStream: Stream[MetricsIO, Bar] = ...
And I want to periodically, say every minute or so, dump the metrics, so I tried:
val scheduler: Scheduler = ...
val sendStream =
scheduler
.awakeEvery[MetricsIO](FiniteDuration(1, TimeUnit.Minutes))
.evalMap(_ => Foo.sendMetrics)
val result = mainStream.concurrently(sendStream).compile.drain
And then I do the usual top level program stuff of calling run with the start state and then calling unsafeRunSync.
The issue is, I only ever see empty metrics! I suspect it's something to with my monoid implicitly providing empty metrics to sendStream but I can't quite figure out why that should be or how to fix it. Maybe there's a way I can "interleave" these sendMetrics calls into the main stream instead?
Edit: here's a minimal complete runnable example:
import fs2._
import cats.implicits._
import cats.data._
import cats.effect._
import java.util.concurrent.Executors
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
val sec = Executors.newScheduledThreadPool(4)
implicit val ec = ExecutionContext.fromExecutorService(sec)
type F[A] = StateT[IO, List[String], A]
val slowInts = Stream.unfoldEval[F, Int, Int](1) { n =>
StateT(state => IO {
Thread.sleep(500)
val message = s"hello $n"
val newState = message :: state
val result = Some((n, n + 1))
(newState, result)
})
}
val ticks = Scheduler.fromScheduledExecutorService(sec).fixedDelay[F](FiniteDuration(1, SECONDS))
val slowIntsPeriodicallyClearedState = slowInts.either(ticks).evalMap[Int] {
case Left(n) => StateT.liftF(IO(n))
case Right(_) => StateT(state => IO {
println(state)
(List.empty, -1)
})
}
Now if I do:
slowInts.take(10).compile.drain.run(List.empty).unsafeRunSync
Then I get the expected result - the state properly accumulates into the output. But if I do:
slowIntsPeriodicallyClearedState.take(10).compile.drain.run(List.empty).unsafeRunSync
Then I see an empty list consistently printed out. I would have expected partial lists (approx. 2 elements) printed out.
StateT is not safe to use with effect types, because it's not safe in the face of concurrent access. Instead, consider using a Ref (from either fs2 or cats-effect, depending what version).
Something like this:
def slowInts(ref: Ref[IO, Int]) = Stream.unfoldEval[F, Int, Int](1) { n =>
val message = s"hello $n"
ref.modify(message :: _) *> IO {
Thread.sleep(500)
val result = Some((n, n + 1))
result
}
}
val ticks = Scheduler.fromScheduledExecutorService(sec).fixedDelay[IO](FiniteDuration(1, SECONDS))
def slowIntsPeriodicallyClearedState(ref: Ref[IO, Int] =
slowInts.either(ticks).evalMap[Int] {
case Left(n) => IO.pure(n)
case Right(_) =>
ref.modify(_ => Nil).flatMap { case Change(previous, now) =>
IO(println(now)).as(-1)
}
}
Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}
a simple code sample that describes my problem:
import scala.util._
import scala.concurrent._
import scala.concurrent.duration._
import ExecutionContext.Implicits.global
class LoserException(msg: String, dice: Int) extends Exception(msg) { def diceRoll: Int = dice }
def aPlayThatMayFail: Future[Int] = {
Thread.sleep(1000) //throwing a dice takes some time...
//throw a dice:
(1 + Random.nextInt(6)) match {
case 6 => Future.successful(6) //I win!
case i: Int => Future.failed(new LoserException("I did not get 6...", i))
}
}
def win(prefix: String): String = {
val futureGameLog = aPlayThatMayFail
futureGameLog.onComplete(t => t match {
case Success(diceRoll) => "%s, and finally, I won! I rolled %d !!!".format(prefix, diceRoll)
case Failure(e) => e match {
case ex: LoserException => win("%s, and then i got %d".format(prefix, ex.diceRoll))
case _: Throwable => "%s, and then somebody cheated!!!".format(prefix)
}
})
"I want to do something like futureGameLog.waitForRecursiveResult, using Await.result or something like that..."
}
win("I started playing the dice")
this simple example illustrates what i want to do. basically, if to put it in words, i want to wait for a result for some computation, when i compose different actions on previous success or failed attampts.
so how would you implement the win method?
my "real world" problem, if it makes any difference, is using dispatch for asynchronous http calls, where i want to keep making http calls whenever the previous one ends, but actions differ on wether the previous http call succeeded or not.
You can recover your failed future with a recursive call:
def foo(x: Int) = x match {
case 10 => Future.successful(x)
case _ => Future.failed[Int](new Exception)
}
def bar(x: Int): Future[Int] = {
foo(x) recoverWith { case _ => bar(x+1) }
}
scala> bar(0)
res0: scala.concurrent.Future[Int] = scala.concurrent.impl.Promise$DefaultPromise#64d6601
scala> res0.value
res1: Option[scala.util.Try[Int]] = Some(Success(10))
recoverWith takes a PartialFunction[Throwable,scala.concurrent.Future[A]] and returns a Future[A]. You should be careful though, because it will use quite some memory when it does lots of recursive calls here.
As drexin answered the part about exception handling and recovering, let me try and answer the part about a recursive function involving futures. I believe using a Promise will help you achieve your goal. The restructured code would look like this:
def win(prefix: String): String = {
val prom = Promise[String]()
def doWin(p:String) {
val futureGameLog = aPlayThatMayFail
futureGameLog.onComplete(t => t match {
case Success(diceRoll) => prom.success("%s, and finally, I won! I rolled %d !!!".format(prefix, diceRoll))
case Failure(e) => e match {
case ex: LoserException => doWin("%s, and then i got %d".format(prefix, ex.diceRoll))
case other => prom.failure(new Exception("%s, and then somebody cheated!!!".format(prefix)))
}
})
}
doWin(prefix)
Await.result(prom.future, someTimeout)
}
Now this won't be true recursion in the sense that it will be building up one long stack due to the fact that the futures are async, but it is similar to recursion in spirit. Using the promise here gives you something to block against while the recursion does it's thing, blocking the caller from what's happening behind the scene.
Now, if I was doing this, I would probable redefine things like so:
def win(prefix: String): Future[String] = {
val prom = Promise[String]()
def doWin(p:String) {
val futureGameLog = aPlayThatMayFail
futureGameLog.onComplete(t => t match {
case Success(diceRoll) => prom.success("%s, and finally, I won! I rolled %d !!!".format(prefix, diceRoll))
case Failure(e) => e match {
case ex: LoserException => doWin("%s, and then i got %d".format(prefix, ex.diceRoll))
case other => prom.failure(new Exception("%s, and then somebody cheated!!!".format(prefix)))
}
})
}
doWin(prefix)
prom.future
}
This way you can defer the decision on whether to block or use async callbacks to the caller of this function. This is more flexible, but it also exposes the caller to the fact that you are doing async computations and I'm not sure that is going to be acceptable for your scenario. I'll leave that decision up to you.
This works for me:
def retryWithFuture[T](f: => Future[T],retries:Int, delay:FiniteDuration) (implicit ec: ExecutionContext, s: Scheduler): Future[T] ={
f.recoverWith { case _ if retries > 0 => after[T](delay,s)(retryWithFuture[T]( f , retries - 1 , delay)) }
}
I've problems understanding where to put type informations in scala, and how to put it. Here I create several sequences of Actors and I don't type them. Even if I had to, I wouldn't know which type of sequence map produces to give them the proper type.
Then later when the compiler yells at me because I'm trying to sum Anys, I've no idea where to begin filling in the gaps.
Here is my code, I tried to minimize it while still letting the necessary info available.
object Actors {
def main(args: Array[String]) {
val array = randomArray(5)
val master = new Master(array, 5)
master.start
}
def randomArray(length: Int): Array[Int] = {
val generator = new Random
new Array[Int](length) map((_:Int) => generator nextInt)
}
}
class Master(array: Array[Int], slavesNumber: Int) extends Actor {
def act () {
val slaves = (1 to slavesNumber).map(_ => new Slave)
slaves.foreach(s => s.start)
val futures = slaves.map(s => s !! Work(array))
val results = awaitAll(3000, futures:_*)
val res2 = results.flatMap(x => x)
println((0 /: res2)(_+_))
}
}
class Slave() extends Actor {
def act () {
Actor.loop {
receive {
case Work(slice) =>
reply((slice :\ 0)(_+_))
}
}
}
}
I'd appreciate too some good pointers towards comprehensive doc on the matter.
The object that are passed between actors are not typed, actors have to filter the object themselves -- as you already do in the Slave actor. As you can see, !! is defined as
def !!(msg: Any): Future[Any]
so there is no type information in the returned Future. Probably the easiest solution is to replace the line var res2 .. with
val res2 = results collect {case Some(y:Int) => y}
this filters out just those Some results that are of type Int.