How to read only Successful values from a Seq of Futures - scala

I am learning akka/scala and am trying to read only those Futures that succeeded from a Seq[Future[Int]] but cant get anything to work.
I simulated an array of 10 Future[Int] some of which fail depending on the value FailThreshold takes (all fail for 10 and none fail for 0).
I then try to read them into an ArrayBuffer (could not find a way to return immutable structure with the values).
Also, there isn't a filter on Success/Failure so had to run an onComplete on each future and update buffer as a side-effect.
Even when the FailThreshold=0 and the Seq has all Future set to Success, the array buffer is sometimes empty and different runs return array of different sizes.
I tried a few other suggestions from the web like using Future.sequence on the list but this throws exception if any of future variables fail.
import akka.actor._
import akka.pattern.ask
import scala.concurrent.{Await, Future, Promise}
import scala.concurrent.duration._
import scala.util.{Timeout, Failure, Success}
import concurrent.ExecutionContext.Implicits.global
case object AskNameMessage
implicit val timeout = Timeout(5, SECONDS)
val FailThreshold = 0
class HeyActor(num: Int) extends Actor {
def receive = {
case AskNameMessage => if (num<FailThreshold) {Thread.sleep(1000);sender ! num} else sender ! num
}
}
class FLPActor extends Actor {
def receive = {
case t: IndexedSeq[Future[Int]] => {
println(t)
val b = scala.collection.mutable.ArrayBuffer.empty[Int]
t.foldLeft( b ){ case (bf,ft) =>
ft.onComplete { case Success(v) => bf += ft.value.get.get }
bf
}
println(b)
}
}
}
val system = ActorSystem("AskTest")
val flm = (0 to 10).map( (n) => system.actorOf(Props(new HeyActor(n)), name="futureListMake"+(n)) )
val flp = system.actorOf(Props(new FLPActor), name="futureListProcessor")
// val delay = akka.pattern.after(500 millis, using=system.scheduler)(Future.failed( throw new IllegalArgumentException("DONE!") ))
val delay = akka.pattern.after(500 millis, using=system.scheduler)(Future.successful(0))
val seqOfFtrs = (0 to 10).map( (n) => Future.firstCompletedOf( Seq(delay, flm(n) ? AskNameMessage) ).mapTo[Int] )
flp ! seqOfFtrs
The receive in FLPActor mostly gets
Vector(Future(Success(0)), Future(Success(1)), Future(Success(2)), Future(Success(3)), Future(Success(4)), Future(Success(5)), Future(Success(6)), Future(Success(7)), Future(Success(8)), Future(Success(9)), Future(Success(10)))
but the array buffer b has varying number of values and empty at times.
Can someone please point me to gaps here,
why would the array buffer have varying sizes even when all Future have resolved to Success,
what is the correct pattern to use when we want to ask different actors with TimeOut and use only those asks that have successfully returned for further processing.

Instead of directly sending the IndexedSeq[Future[Int]], you should transform to Future[IndexedSeq[Int]] and then pipe it to the next actor. You don't send the Futures directly to an actor. You have to pipe it.
HeyActor can stay unchanged.
After
val seqOfFtrs = (0 to 10).map( (n) => Future.firstCompletedOf( Seq(delay, flm(n) ? AskNameMessage) ).mapTo[Int] )
do a recover, and use Future.sequence to turn it into one Future:
val oneFut = Future.sequence(seqOfFtrs.map(f=>f.map(Some(_)).recover{ case (ex: Throwable) => None})).map(_.flatten)
If you don't understand the business with Some, None, and flatten, then make sure you understand the Option type. One way to remove values from a sequence is to map values in the sequence to Option (either Some or None) and then to flatten the sequence. The None values are removed and the Some values are unwrapped.
After you have transformed your data into a single Future, pipe it over to FLPActor:
oneFut pipeTo flp
FLPActor should be rewritten with the following receive function:
def receive = {
case printme: IndexedSeq[Int] => println(printme)
}
In Akka, modifying some state in the main thread of your actor from a Future or the onComplete of a Future is a big no-no. In the worst case, it results in race conditions. Remember that each Future runs on its own thread, so running a Future inside an actor means you have concurrent work being done in different threads. Having the Future directly modify some state in your actor while the actor is also processing some state is a recipe for disaster. In Akka, you process all changes to state directly in the primary thread of execution of the main actor. If you have some work done in a Future and need to access that work from the main thread of an actor, you pipe it to that actor. The pipeTo pattern is functional, correct, and safe for accessing the finished computation of a Future.
To answer your question about why FLPActor is not printing out the IndexedSeq correctly: you are printing out the ArrayBuffer before your Futures have been completed. onComplete isn't the right idiom to use in this case, and you should avoid it in general as it isn't good functional style.
Don't forget the import akka.pattern.pipe for the pipeTo syntax.

Related

Akka Stream return object from Sink

I've got a SourceQueue. When I offer an element to this I want it to pass through the Stream and when it reaches the Sink have the output returned to the code that offered this element (similar as Sink.head returns an element to the RunnableGraph.run() call).
How do I achieve this? A simple example of my problem would be:
val source = Source.queue[String](100, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.ReturnTheStringSomehow
val graph = source.via(flow).to(sink).run()
val x = graph.offer("foo")
println(x) // Output should be "Modified foo"
val y = graph.offer("bar")
println(y) // Output should be "Modified bar"
val z = graph.offer("baz")
println(z) // Output should be "Modified baz"
Edit: For the example I have given in this question Vladimir Matveev provided the best answer. However, it should be noted that this solution only works if the elements are going into the sink in the same order they were offered to the source. If this cannot be guaranteed the order of the elements in the sink may differ and the outcome might be different from what is expected.
I believe it is simpler to use the already existing primitive for pulling values from a stream, called Sink.queue. Here is an example:
val source = Source.queue[String](128, OverflowStrategy.fail)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.queue[String]().withAttributes(Attributes.inputBuffer(1, 1))
val (sourceQueue, sinkQueue) = source.via(flow).toMat(sink)(Keep.both).run()
def getNext: String = Await.result(sinkQueue.pull(), 1.second).get
sourceQueue.offer("foo")
println(getNext)
sourceQueue.offer("bar")
println(getNext)
sourceQueue.offer("baz")
println(getNext)
It does exactly what you want.
Note that setting the inputBuffer attribute for the queue sink may or may not be important for your use case - if you don't set it, the buffer will be zero-sized and the data won't flow through the stream until you invoke the pull() method on the sink.
sinkQueue.pull() yields a Future[Option[T]], which will be completed successfully with Some if the sink receives an element or with a failure if the stream fails. If the stream completes normally, it will be completed with None. In this particular example I'm ignoring this by using Option.get but you would probably want to add custom logic to handle this case.
Well, you know what offer() method returns if you take a look at its definition :) What you can do is to create Source.queue[(Promise[String], String)], create helper function that pushes pair to stream via offer, make sure offer doesn't fail because queue might be full, then complete promise inside your stream and use future of the promise to catch completion event in external code.
I do that to throttle rate to external API used from multiple places of my project.
Here is how it looked in my project before Typesafe added Hub sources to akka
import scala.concurrent.Promise
import scala.concurrent.Future
import java.util.concurrent.ConcurrentLinkedDeque
import akka.stream.scaladsl.{Keep, Sink, Source}
import akka.stream.{OverflowStrategy, QueueOfferResult}
import scala.util.Success
private val queue = Source.queue[(Promise[String], String)](100, OverflowStrategy.backpressure)
.toMat(Sink.foreach({ case (p, param) =>
p.complete(Success(param.reverse))
}))(Keep.left)
.run
private val futureDeque = new ConcurrentLinkedDeque[Future[String]]()
private def sendQueuedRequest(request: String): Future[String] = {
val p = Promise[String]
val offerFuture = queue.offer(p -> request)
def addToQueue(future: Future[String]): Future[String] = {
futureDeque.addLast(future)
future.onComplete(_ => futureDeque.remove(future))
future
}
offerFuture.flatMap {
case QueueOfferResult.Enqueued =>
addToQueue(p.future)
}.recoverWith {
case ex =>
val first = futureDeque.pollFirst()
if (first != null)
addToQueue(first.flatMap(_ => sendQueuedRequest(request)))
else
sendQueuedRequest(request)
}
}
I realize that blocking synchronized queue may be bottleneck and may grow indefinitely but because API calls in my project are made only from other akka streams which are backpressured I never have more than dozen items in futureDeque. Your situation may differ.
If you create MergeHub.source[(Promise[String], String)]() instead you'll get reusable sink. Thus every time you need to process item you'll create complete graph and run it. In that case you won't need hacky java container to queue requests.

Result of map and mapAsync(1) in Akka Stream

My code which uses mapAync(1) doesn't work what I want it to do. But when I changed the mapAsync(1) to map by using Await.result, it works. So I have a question.
Does the following (A) Use map and (B) use mapAsync(1) yield the same result at anytime?
// (A) Use map
someSource
.map{r =>
val future = makeFuture(r) // returns the same future if r is the same
Await.result(future, Duration.Inf)
}
// (B) Use mapAsync(1)
someSource
.mapAsync(1){r =>
val future = makeFuture(r) // returns the same future if r is the same
future
}
Actually, I want to paste my real code, but it is too long to paste and has some dependencies of my original stages.
While semantically the type of both streams ends up being the same (Source[Int, NotUsed]), the style displayed in example (A) is very very bad – please don't block (Await) inside streams.
Such cases are exactly the use case for mapAsync. Your operation returns a Future[T], and you want to push that value downwards through the stream once the future completes. Please note that there is no blocking in mapAsync, it schedules a callback to push the value of the future internally and does so once it completes.
To answer your question about "do they do the same thing?", technically yes but the first one will cause performance issues in the threadpool you're running on, avoid map+blocking when mapAsync can do the job.
These calls are semantically very similar, although blocking by using Await is probably not a good idea. The type signature of both these calls is, of course, the same (Source[Int, NotUsed]), and in many cases these calls will produce the same results (blocking aside). The following, for example, which includes scheduled futures and a non-default supervision strategy for failures, gives the same results for both map with an Await inside and mapAsync:
import akka.actor._
import akka.stream.ActorAttributes.supervisionStrategy
import akka.stream.Supervision.resumingDecider
import akka.stream._
import akka.stream.scaladsl._
import scala.concurrent._
import scala.concurrent.duration._
import scala.language.postfixOps
object Main {
def main(args: Array[String]) {
implicit val system = ActorSystem("TestSystem")
implicit val materializer = ActorMaterializer()
import scala.concurrent.ExecutionContext.Implicits.global
import system.scheduler
def makeFuture(r: Int) = {
akka.pattern.after(2 seconds, scheduler) {
if (r % 3 == 0)
Future.failed(new Exception(s"Failure for input $r"))
else
Future(r + 100)
}
}
val someSource = Source(1 to 20)
val mapped = someSource
.map { r =>
val future = makeFuture(r)
Await.result(future, Duration.Inf)
}.withAttributes(supervisionStrategy(resumingDecider))
val mappedAsync = someSource
.mapAsyncUnordered(1) { r =>
val future = makeFuture(r)
future
}.withAttributes(supervisionStrategy(resumingDecider))
mapped runForeach println
mappedAsync runForeach println
}
}
It is possible that your upstream code is relying on the blocking behaviour in your map call in some way. Can you produce a concise reproduction of the issue that you are seeing?

Clear onComplete functions list in Promise

This snippet
val some = Promise[Int]()
some.success(20)
val someFuture = some.future
someFuture.onSuccess {case i => println(i)}
someFuture.onComplete {case iTry => List(println(iTry.get*2), println(iTry.get*3))}
creates promise with list of 3 callbacks(List[CallbackRunnable]) on complete. Is there a way to clear this List or rewrite it?
This is kind of technically possible. But definitely not in the way that you want. If we execute a Future (or Promise) in one ExecutionContext, and the callback in another, we can kill the callback's ExecutionContext so that it can't complete. This works in the REPL (throws an exception somewhere), but is terrible idea to actually try in real code:
import scala.concurrent._
import java.util.concurrent.Executors
val ex1 = Executors.newFixedThreadPool(1)
val ex2 = Executors.newFixedThreadPool(1)
val ec1 = ExecutionContext.fromExecutorService(ex1)
val ec2 = ExecutionContext.fromExecutorService(ex2)
val f = Future { Thread.sleep(30000); println("done") }(ec1) // start in one ExecutionContext
f.onComplete { case t => println("onComplete done") }(ec2) // start callback in another ExecutionContext
ec2.shutdownNow
f.onComplete { case t => println("onComplete change") }(ec1) // start another callback in the original ExecutionContext
The first onComplete will not run (an exception is thrown in the background, which may or may not cause horrible things to happen elsewhere), but the second one does. But this is truly terrible, so don't do it.
Even if there was a construct to clear callbacks, it wouldn't be a good idea to use it.
someFuture.onComplete { case result => // do something important }
someFuture.clearCallbacks() // imaginary method to clear the callbacks
someFuture.onComplete { case result =>> // do something else }
The execution of this hypothetical code is non-deterministic. someFuture could complete before clearCallbacks is called, meaning both callbacks would get called, instead of just the second. But if it hasn't run yet, then only one callback will fire. There wouldn't be a nice way of determining that, which would lead to some truly horrible bugs.

Multiple Future calls in an Actor's receive method

I'm trying to make two external calls (to a Redis database) inside an Actor's receive method. Both calls return a Future and I need the result of the first Future inside the second.
I'm wrapping both calls inside a Redis transaction to avoid anyone else from modifying the value in the database while I'm reading it.
The internal state of the actor is updated based on the value of the second Future.
Here is what my current code looks like which I is incorrect because I'm updating the internal state of the actor inside a Future.onComplete callback.
I cannot use the PipeTo pattern because I need both both Future have to be in a transaction.
If I use Await for the first Future then my receive method will block.
Any idea how to fix this ?
My second question is related to how I'm using Futures. Is this usage of Futures below correct? Is there a better way of dealing with multiple Futures in general? Imagine if there were 3 or 4 Future each depending on the previous one.
import akka.actor.{Props, ActorLogging, Actor}
import akka.util.ByteString
import redis.RedisClient
import scala.concurrent.Future
import scala.util.{Failure, Success}
object GetSubscriptionsDemo extends App {
val akkaSystem = akka.actor.ActorSystem("redis-demo")
val actor = akkaSystem.actorOf(Props(new SimpleRedisActor("localhost", "dummyzset")), name = "simpleactor")
actor ! UpdateState
}
case object UpdateState
class SimpleRedisActor(ip: String, key: String) extends Actor with ActorLogging {
//mutable state that is updated on a periodic basis
var mutableState: Set[String] = Set.empty
//required by Future
implicit val ctx = context dispatcher
var rClient = RedisClient(ip)(context.system)
def receive = {
case UpdateState => {
log.info("Start of UpdateState ...")
val tran = rClient.transaction()
val zf: Future[Long] = tran.zcard(key) //FIRST Future
zf.onComplete {
case Success(z) => {
//SECOND Future, depends on result of FIRST Future
val rf: Future[Seq[ByteString]] = tran.zrange(key, z - 1, z)
rf.onComplete {
case Success(x) => {
//convert ByteString to UTF8 String
val v = x.map(_.utf8String)
log.info(s"Updating state with $v ")
//update actor's internal state inside callback for a Future
//IS THIS CORRECT ?
mutableState ++ v
}
case Failure(e) => {
log.warning("ZRANGE future failed ...", e)
}
}
}
case Failure(f) => log.warning("ZCARD future failed ...", f)
}
tran.exec()
}
}
}
The compiles but when I run it gets struck.
2014-08-07 INFO [redis-demo-akka.actor.default-dispatcher-3] a.e.s.Slf4jLogger - Slf4jLogger started
2014-08-07 04:38:35.106UTC INFO [redis-demo-akka.actor.default-dispatcher-3] e.c.s.e.a.g.SimpleRedisActor - Start of UpdateState ...
2014-08-07 04:38:35.134UTC INFO [redis-demo-akka.actor.default-dispatcher-8] r.a.RedisClientActor - Connect to localhost/127.0.0.1:6379
2014-08-07 04:38:35.172UTC INFO [redis-demo-akka.actor.default-dispatcher-4] r.a.RedisClientActor - Connected to localhost/127.0.0.1:6379
UPDATE 1
In order to use pipeTo pattern I'll need access to the tran and the FIRST Future (zf) in the actor where I'm piping the Future to because the SECOND Future depends on the value (z) of FIRST.
//SECOND Future, depends on result of FIRST Future
val rf: Future[Seq[ByteString]] = tran.zrange(key, z - 1, z)
Without knowing too much about the redis client you are using, I can offer an alternate solution that should be cleaner and won't have issues with closing over mutable state. The idea is to use a master/worker kind of situation, where the master (the SimpleRedisActor) receives the request to do the work and then delegates off to a worker that performs the work and responds with the state to update. That solution would look something like this:
object SimpleRedisActor{
case object UpdateState
def props(ip:String, key:String) = Props(classOf[SimpleRedisActor], ip, key)
}
class SimpleRedisActor(ip: String, key: String) extends Actor with ActorLogging {
import SimpleRedisActor._
import SimpleRedisWorker._
//mutable state that is updated on a periodic basis
var mutableState: Set[String] = Set.empty
val rClient = RedisClient(ip)(context.system)
def receive = {
case UpdateState =>
log.info("Start of UpdateState ...")
val worker = context.actorOf(SimpleRedisWorker.props)
worker ! DoWork(rClient, key)
case WorkResult(result) =>
mutableState ++ result
case FailedWorkResult(ex) =>
log.error("Worker got failed work result", ex)
}
}
object SimpleRedisWorker{
case class DoWork(client:RedisClient, key:String)
case class WorkResult(result:Seq[String])
case class FailedWorkResult(ex:Throwable)
def props = Props[SimpleRedisWorker]
}
class SimpleRedisWorker extends Actor{
import SimpleRedisWorker._
import akka.pattern.pipe
import context._
def receive = {
case DoWork(client, key) =>
val trans = client.transaction()
trans.zcard(key) pipeTo self
become(waitingForZCard(sender, trans, key) orElse failureHandler(sender, trans))
}
def waitingForZCard(orig:ActorRef, trans:RedisTransaction, key:String):Receive = {
case l:Long =>
trans.zrange(key, l -1, l) pipeTo self
become(waitingForZRange(orig, trans) orElse failureHandler(orig, trans))
}
def waitingForZRange(orig:ActorRef, trans:RedisTransaction):Receive = {
case s:Seq[ByteString] =>
orig ! WorkResult(s.map(_.utf8String))
finishAndStop(trans)
}
def failureHandler(orig:ActorRef, trans:RedisTransaction):Receive = {
case Status.Failure(ex) =>
orig ! FailedWorkResult(ex)
finishAndStop(trans)
}
def finishAndStop(trans:RedisTransaction) {
trans.exec()
context stop self
}
}
The worker starts the transaction and then makes calls into redis and ultimately completes the transaction before stopping itself. When it calls redis, it gets the future and pipes back to itself for the continuation of the processing, changing the receive method between as a mechanism of showing progressing through its states. In a model like this (which I suppose is somewhat similar to the error kernal pattern), the master owns and protects the state, delegating the "risky" work off to a child who can figure out what the change for the state should be, but the changing is still owned by the master.
Now again, I have no idea about the capabilities of the redis client you are using and if it is safe enough to even do this kind of stuff, but that's not really the point. The point was to show a safer structure for doing something like this that involves futures and state that needs to be changed safely.
Using the callback to mutate internal state is not a good idea, excerpt from the akka docs:
When using future callbacks, such as onComplete, onSuccess, and onFailure, inside actors you need to carefully avoid closing over the containing actor’s reference, i.e. do not call methods or access mutable state on the enclosing actor from within the callback.
Why do you worry about pipeTo and transactions?
Not sure how redis transactions work, but I would guess that the transaction does not encompass the onComplete callback on the second future anyways.
I would put the state into a separate actor which you pipe the future too. This way you have a separate mailbox, and the ordering there will be the same as the ordering of the messages that came in to modify the state. Also if any read requests come in, they will also be put in the correct order.
Edit to respond to edited question: Ok, so you don't want to pipe the first future, that makes sense, and should be no problem as the first callback is harmless. The callback of the second future is the problem, as it manipulates the state. But this future can be pipe without the need for access to the transaction.
So basically my suggestion is:
val firstFuture = tran.zcard
firstFuture.onComplete {
val secondFuture = tran.zrange
secondFuture pipeTo stateActor
}
With stateActor containing the mutable state.

Simple Scala actor question

I'm sure this is a very simple question, but embarrassed to say I can't get my head around it:
I have a list of values in Scala.
I would like to use use actors to make some (external) calls with each value, in parallel.
I would like to wait until all values have been processed, and then proceed.
There's no shared values being modified.
Could anyone advise?
Thanks
There's an actor-using class in Scala that's made precisely for this kind of problem: Futures. This problem would be solved like this:
// This assigns futures that will execute in parallel
// In the example, the computation is performed by the "process" function
val tasks = list map (value => scala.actors.Futures.future { process(value) })
// The result of a future may be extracted with the apply() method, which
// will block if the result is not ready.
// Since we do want to block until all results are ready, we can call apply()
// directly instead of using a method such as Futures.awaitAll()
val results = tasks map (future => future.apply())
There you go. Just that.
Create workers and ask them for futures using !!; then read off the results (which will be calculated and come in in parallel as they're ready; you can then use them). For example:
object Example {
import scala.actors._
class Worker extends Actor {
def act() { Actor.loop { react {
case s: String => reply(s.length)
case _ => exit()
}}}
}
def main(args: Array[String]) {
val arguments = args.toList
val workers = arguments.map(_ => (new Worker).start)
val futures = for ((w,a) <- workers zip arguments) yield w !! a
val results = futures.map(f => f() match {
case i: Int => i
case _ => throw new Exception("Whoops--didn't expect to get that!")
})
println(results)
workers.foreach(_ ! None)
}
}
This does a very inexpensive computation--calculating the length of a string--but you can put something expensive there to make sure it really does happen in parallel (the last thing that case of the act block should be to reply with the answer). Note that we also include a case for the worker to shut itself down, and when we're all done, we tell the workers to shut down. (In this case, any non-string shuts down the worker.)
And we can try this out to make sure it works:
scala> Example.main(Array("This","is","a","test"))
List(4, 2, 1, 4)