Chaining context through akka streams - scala

I'm converting some C# code to scala and akka streams.
My c# code looks something like this:
Task<Result1> GetPartialResult1Async(Request request) ...
Task<Result2> GetPartialResult2Async(Request request) ...
async Task<Result> GetResultAsync(Request request)
{
var result1 = await GetPartialResult1Async(request);
var result2 = await GetPartialResult2Async(request);
return new Result(request, result1, result2);
}
Now for the akka streams. Instead of having a function from Request to a Task of a result, I have flows from a Request to a Result.
So I already have the following two flows:
val partialResult1Flow: Flow[Request, Result1, NotUsed] = ...
val partialResult2Flow: Flow[Request, Result2, NotUsed] = ...
However I can't see how to combine them into a complete flow, since by calling via on the first flow we lose the original request, and by calling via on the second flow we lose the result of the first flow.
So I've created a WithState monad which looks something like this:
case class WithState[+TState, +TValue](value: TValue, state: TState) {
def map[TResult](func: TValue => TResult): WithState[TState, TResult] = {
WithState(func(value), state)
}
... bunch more helper functions go here
}
Then I'm rewriting my original flows to look like this:
def partialResult1Flow[TState]: Flow[WithState[TState, Request], WithState[TState, Result1]] = ...
def partialResult2Flow: Flow[WithState[TState, Request], WithState[TState, Result2]] = ...
and using them like this:
val flow = Flow[Request]
.map(x => WithState(x, x))
.via(partialResult1Flow)
.map(x => WithState(x.state, (x.state, x.value))
.via(partialResult2Flow)
.map(x => Result(x.state._1, x.state._2, x.value))
Now this works, but of course I can't guarantee how flow will be used. So I really ought to make it take a State parameter:
def flow[TState] = Flow[WithState[TState, Request]]
.map(x => WithState(x.value, (x.state, x.value)))
.via(partialResult1Flow)
.map(x => WithState(x.state._2, (x.state, x.value))
.via(partialResult2Flow)
.map(x => WithState(Result(x.state._1._2, x.state._2, x.value), x.state._1._1))
Now at this stage my code is getting extremely hard to read. I could clean it up by naming the functions, and using case classes instead of tuples etc. but fundamentally there's a lot of incidental complexity here, which is hard to avoid.
Am I missing something? Is this not a good use case for Akka streams? Is there some inbuilt way of doing this?

I don't have any fundamentally different way to do this than I described in the question.
However the current flow can be significantly improved:
Stage 1: FlowWithContext
Instead of using a custom WithState monad, it's possible to use the built in FlowWithContext.
The advantage of this is that you can use the standard operators on the flow, without needing to worry about transforming the WithState monad. Akka takes care of this for you.
So instead of
def partialResult1Flow[TState]: Flow[WithState[TState, Request], WithState[TState, Result1]] =
Flow[WithState[TState, Request]].mapAsync(_ mapAsync {doRequest(_)})
We can write:
def partialResult1Flow[TState]: FlowWithContext[Request, TState, Result1, TState, NotUsed] =
FlowWithContext[Request, TState].mapAsync(doRequest(_))
Unfortunately though, whilst FlowWithContext is quite easy to write when you don't need to change the context, it's a little fiddly to use when you need to go via a stream which requires you to move some of your current data into the context (as ours does). In order to do that you need to convert to a Flow (using asFlow), and then back to a FlowWithContext using asFlowWithContext.
I found it easiest to just write the whole thing as a Flow in such cases, and convert to a FlowWithContext at the end.
For example:
def flow[TState]: FlowWithContext[Request, TState, Result, TState, NotUsed] =
Flow[(Request, TState)]
.map(x => (x._1, (x._1, x._2)))
.via(partialResult1Flow)
.map(x => (x._2._1, (x._2._1, x._1, x._2._2))
.via(partialResult2Flow)
.map(x => (Result(x._2._1, x._2._2, x._1), x._2._2))
.asFlowWithContext((a: Request, b: TState) => (a,b))(_._2)
.map(_._1)
Is this any better?
In this particular case it's probably worse. In other cases, where you rarely need to change the context it would be better. However either way I would recommend using it as it's built in, rather than relying on a custom monad.
Stage 2: viaUsing
In order to make this a bit more user friendly I created a viaUsing extension method for Flow and FlowWithContext:
import akka.stream.{FlowShape, Graph}
import akka.stream.scaladsl.{Flow, FlowWithContext}
object FlowExtensions {
implicit class FlowViaUsingOps[In, Out, Mat](val f: Flow[In, Out, Mat]) extends AnyVal {
def viaUsing[Out2, Using, Mat2](func: Out => Using)(flow: Graph[FlowShape[(Using, Out), (Out2, Out)], Mat2]) : Flow[In, (Out2, Out), Mat] =
f.map(x => (func(x), x)).via(flow)
}
implicit class FlowWithContextViaUsingOps[In, CtxIn, Out, CtxOut, Mat](val f: FlowWithContext[In, CtxIn, Out, CtxOut, Mat]) extends AnyVal {
def viaUsing[Out2, Using, Mat2](func: Out => Using)(flow: Graph[FlowShape[(Using, (Out, CtxOut)), (Out2, (Out, CtxOut))], Mat2]):
FlowWithContext[In, CtxIn, (Out2, Out), CtxOut, Mat] =
f
.asFlow
.map(x => (func(x._1), (x._1, x._2)))
.via(flow)
.asFlowWithContext((a: In, b: CtxIn) => (a,b))(_._2._2)
.map(x => (x._1, x._2._1))
}
}
The purpose of viaUsing, is to create the input for a FlowWithContext from the current output, whilst preserving your current output by passing it through the context. It result in a Flow whose output is the a tuple of the output from the nested flow, and the original flow.
With viaUsing our example simplifies to:
def flow[TState]: FlowWithContext[Request, TState, Result, TState, NotUsed] =
FlowWithContext[Request, TState]
.viaUsing(x => x)(partialResult1Flow)
.viaUsing(x => x._2)(partialResult2Flow)
.map(x => Result(x._2._2, x._2._1, x._1))
I think this is a significant improvement. I've made a request to add viaUsing to Akka instead of relying on extension methods here.

I agree using Akka Streams for backpressure is useful. However, I'm not convinced that modelling the calculation of the partialResults as streams is useful here. having the 'inner' logic based on Futures and wrapping those in the mapAsync of your flow to apply backpressure to the whole operation as one unit seems simpler, and perhaps even better.
This is basically a boiled-down refactoring of Levi Ramsey's earlier excellent answer:
import scala.concurrent.{ ExecutionContext, Future }
import akka.NotUsed
import akka.stream._
import akka.stream.scaladsl._
case class Request()
case class Result1()
case class Result2()
case class Response(r: Request, r1: Result1, r2: Result2)
def partialResult1(req: Request): Future[Result1] = ???
def partialResult2(req: Request): Future[Result2] = ???
val system = akka.actor.ActorSystem()
implicit val ec: ExecutionContext = system.dispatcher
val flow: Flow[Request, Response, NotUsed] =
Flow[Request]
.mapAsync(parallelism = 12) { req =>
for {
res1 <- partialResult1(req)
res2 <- partialResult2(req)
} yield (Response(req, res1, res2))
}
I would start with this, and only if you know you have reason to split partialResult1 and partialResult2 into separate stages introduce an intermediate step in the Flow. Depending on your requirements mapAsyncUnordered might be more suitable.

Disclaimer, I'm not totally familiar with C#'s async/await.
From what I've been able to glean from a quick perusal of the C# docs, Task<T> is a strictly (i.e. eager, not lazy) evaluated computation which will if successful eventually contain a T. The Scala equivalent of this is Future[T], where the equivalent of the C# code would be:
import scala.concurrent.{ ExecutionContext, Future }
def getPartialResult1Async(req: Request): Future[Result1] = ???
def getPartialResult2Async(req: Request): Future[Result2] = ???
def getResultAsync(req: Request)(implicit ectx: ExecutionContext): Future[Result] = {
val result1 = getPartialResult1Async(req)
val result2 = getPartialResult2Async(req)
result1.zipWith(result2) { tup => val (r1, r2) = tup
new Result(req, r1, r2)
}
/* Could also:
* for {
* r1 <- result1
* r2 <- result2
* } yield { new Result(req, r1, r2) }
*
* Note that both the `result1.zipWith(result2)` and the above `for`
* construction may compute the two partial results simultaneously. If you
* want to ensure that the second partial result is computed after the first
* partial result is successfully computed:
* for {
* r1 <- getPartialResult1Async(req)
* r2 <- getPartialResult2Async(req)
* } yield new Result(req, r1, r2)
*/
}
No Akka Streams required for this particular case, but if you have some other need to use Akka Streams, You could express this as
val actorSystem = ??? // In Akka Streams 2.6, you'd probably have this as an implicit val
val parallelism = ??? // Controls requests in flight
val flow = Flow[Request]
.mapAsync(parallelism) { req =>
import actorSystem.dispatcher
getPartialResult1Async(req).map { r1 => (req, r1) }
}
.mapAsync(parallelism) { tup =>
import actorSystem.dispatcher
getPartialResult2Async(tup._2).map { r2 =>
new Result(tup._1, tup._2, r2)
}
}
/* Given the `getResultAsync` function in the previous snippet, you could also:
* val flow = Flow[Request].mapAsync(parallelism) { req =>
* getResultAsync(req)(actorSystem.dispatcher)
* }
*/
One advantage of the Future-based implementation is that it's pretty easy to integrate with whatever Scala abstraction of concurrency/parallelism you want to use in a given context (e.g. cats, akka stream, akka). My general instinct to an Akka Streams integration would be in the direction of the three-liner in my comment in the second code block.

Related

How to Promise.allSettled with Scala futures?

I have two scala futures. I want to perform an action once both are completed, regardless of whether they were completed successfully. (Additionally, I want the ability to inspect those results at that time.)
In Javascript, this is Promise.allSettled.
Does Scala offer a simple way to do this?
One last wrinkle, if it matters: I want to do this in a JRuby application.
You can use the transform method to create a Future that will always succeed and return the result or the error as a Try object.
def toTry[A](future: Future[A])(implicit ec: ExecutionContext): Future[Try[A]] =
future.transform(x => Success(x))
To combine two Futures into one, you can use zip:
def settle2[A, B](fa: Future[A], fb: Future[B])(implicit ec: ExecutionContext)
: Future[(Try[A], Try[B])] =
toTry(fa).zip(toTry(fb))
If you want to combine an arbitrary number of Futures this way, you can use Future.traverse:
def allSettled[A](futures: List[Future[A]])(implicit ec: ExecutionContext)
: Future[List[Try[A]]] =
Future.traverse(futures)(toTry(_))
Normally in this case we use Future.sequence to transform a collection of a Future into one single Future so you can map on it, but Scala short circuit the failed Future and doesn't wait for anything after that (Scala considers one failure to be a failure for all), which doesn't fit your case.
In this case you need to map failed ones to successful, then do the sequence, e.g.
val settledFuture = Future.sequence(List(future1, future2, ...).map(_.recoverWith { case _ => Future.unit }))
settledFuture.map(//Here it is all settled)
EDIT
Since the results need to be kept, instead of mapping to Future.unit, we map the actual result into another layer of Try:
val settledFuture = Future.sequence(
List(Future(1), Future(throw new Exception))
.map(_.map(Success(_)).recover(Failure(_)))
)
settledFuture.map(println(_))
//Output: List(Success(1), Failure(java.lang.Exception))
EDIT2
It can be further simplified with transform:
Future.sequence(listOfFutures.map(_.transform(Success(_))))
Perhaps you could use a concurrent counter to keep track of the number of completed Futures and then complete the Promise once all Futures have completed
def allSettled[T](futures: List[Future[T]]): Future[List[Future[T]]] = {
val p = Promise[List[Future[T]]]()
val length = futures.length
val completedCount = new AtomicInteger(0)
futures foreach {
_.onComplete { _ =>
if (completedCount.incrementAndGet == length) p.trySuccess(futures)
}
}
p.future
}
val futures = List(
Future(-11),
Future(throw new Exception("boom")),
Future(42)
)
allSettled(futures).andThen(println(_))
// Success(List(Future(Success(-11)), Future(Failure(java.lang.Exception: boom)), Future(Success(42))))
scastie

Multiple futures that may fail - returning both successes and failures?

I have a situation where I need to run a bunch of operations in parallel.
All operations have the same return value (say a Seq[String]).
Its possible that some of the operations may fail, and others successfully return results.
I want to return both the successful results, and any exceptions that happened, so I can log them for debugging.
Is there a built-in way, or easy way through any library (cats/scalaz) to do this, before I go and write my own class for doing this?
I was thinking of doing each operation in its own future, then checking each future, and returning a tuple of Seq[String] -> Seq[Throwable] where left value is the successful results (flattened / combined) and right is a list of any exceptions that occurred.
Is there a better way?
Using Await.ready, which you mention in a comment, generally loses most benefits from using futures. Instead you can do this just using the normal Future combinators. And let's do the more generic version, which works for any return type; flattening the Seq[String]s can be added easily.
def successesAndFailures[T](futures: Seq[Future[T]]): Future[(Seq[T], Seq[Throwable])] = {
// first, promote all futures to Either without failures
val eitherFutures: Seq[Future[Either[Throwable, T]]] =
futures.map(_.transform(x => Success(x.toEither)))
// then sequence to flip Future and Seq
val futureEithers: Future[Seq[Either[Throwable, T]]] =
Future.sequence(eitherFutures)
// finally, Seq of Eithers can be separated into Seqs of Lefts and Rights
futureEithers.map { seqOfEithers =>
val (lefts, rights) = seqOfEithers.partition(_.isLeft)
val failures = lefts.map(_.left.get)
val successes = rights.map(_.right.get)
(successes, failures)
}
}
Scalaz and Cats have separate to simplify the last step.
The types can be inferred by the compiler, they are shown just to help you see the logic.
Calling value on your Future returns an Option[Try[T]]. If the Future has not completed then the Option is None. If it has completed then it's easy to unwrap and process.
if (myFutr.isCompleted)
myFutr.value.map(_.fold( err: Throwable => //log the error
, ss: Seq[String] => //process results
))
else
// do something else, come back later
Sounds like a good use-case for the Try idiom (it's basically similar to the Either monad).
Example of usage from the doc:
import scala.util.{Success, Failure}
val f: Future[List[String]] = Future {
session.getRecentPosts
}
f onComplete {
case Success(posts) => for (post <- posts) println(post)
case Failure(t) => println("An error has occurred: " + t.getMessage)
}
It actually does a little bit more than what you asked because it is fully asynchronous. Does it fit your use-case?
I'd do it this way:
import scala.concurrent.{Future, ExecutionContext}
import scala.util.Success
def eitherify[A](f: Future[A])(implicit ec: ExecutionContext): Future[Either[Throwable, A]] = f.transform(tryResult => Success(tryResult.toEither))
def eitherifyF[A, B](f: A => Future[B])(implicit ec: ExecutionContext): A => Future[Either[Throwable, B]] = { a => eitherify(f(a)) }
// here we need some "cats" magic for `traverse` and `separate`
// instead of `traverse` you can use standard `Future.sequence`
// there is no analogue for `separate` in the standard library
import cats.implicits._
def myProgram[A, B](values: List[A], asyncF: A => Future[B])(implicit ec: ExecutionContext): Future[(List[Throwable], List[B])] = {
val appliedTransformations: Future[List[Either[Throwable, B]]] = values.traverse(eitherifyF(asyncF))
appliedTransformations.map(_.separate)
}

How to create an akka-stream Source from a Flow that generate values recursively?

I need to traverse an API that is shaped like a tree. For example, a directory structure or threads of discussion. It can be modeled via the following flow:
type ItemId = Int
type Data = String
case class Item(data: Data, kids: List[ItemId])
def randomData(): Data = scala.util.Random.alphanumeric.take(2).mkString
// 0 => [1, 9]
// 1 => [10, 19]
// 2 => [20, 29]
// ...
// 9 => [90, 99]
// _ => []
// NB. I don't have access to this function, only the itemFlow.
def nested(id: ItemId): List[ItemId] =
if (id == 0) (1 to 9).toList
else if (1 <= id && id <= 9) ((id * 10) to ((id + 1) * 10 - 1)).toList
else Nil
val itemFlow: Flow[ItemId, Item, NotUsed] =
Flow.fromFunction(id => Item(randomData, nested(id)))
How can I traverse this data? I got the following working:
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
import scala.concurrent.Await
import scala.concurrent.duration.Duration
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val loop =
GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val source = b.add(Flow[Int])
val merge = b.add(Merge[Int](2))
val fetch = b.add(itemFlow)
val bcast = b.add(Broadcast[Item](2))
val kids = b.add(Flow[Item].mapConcat(_.kids))
val data = b.add(Flow[Item].map(_.data))
val buffer = Flow[Int].buffer(100, OverflowStrategy.dropHead)
source ~> merge ~> fetch ~> bcast ~> data
merge <~ buffer <~ kids <~ bcast
FlowShape(source.in, data.out)
}
val flow = Flow.fromGraph(loop)
Await.result(
Source.single(0).via(flow).runWith(Sink.foreach(println)),
Duration.Inf
)
system.terminate()
However, since I'm using a flow with a buffer, the Stream will never complete.
Completes when upstream completes and buffered elements have been drained
Flow.buffer
I read the Graph cycles, liveness, and deadlocks section multiple times and I'm still struggling to find an answer.
This would create a live lock:
import java.util.concurrent.atomic.AtomicInteger
def unfold[S, E](seed: S, flow: Flow[S, E, NotUsed])(loop: E => List[S]): Source[E, NotUsed] = {
// keep track of how many element flows,
val remaining = new AtomicInteger(1) // 1 = seed
// should be > max loop(x)
val bufferSize = 10000
val (ref, publisher) =
Source.actorRef[S](bufferSize, OverflowStrategy.fail)
.toMat(Sink.asPublisher(true))(Keep.both)
.run()
ref ! seed
Source.fromPublisher(publisher)
.via(flow)
.map{x =>
loop(x).foreach{ c =>
remaining.incrementAndGet()
ref ! c
}
x
}
.takeWhile(_ => remaining.decrementAndGet > 0)
}
EDIT: I added a git repo to test your solution https://github.com/MasseGuillaume/source-unfold
Cause of Non-Completion
I don't think the cause of the stream never completing is due to "using a flow with a buffer". The actual cause, similar to this question, is the fact that merge with the default parameter eagerClose=False is waiting on both the source and the buffer to complete before it (merge) completes. But buffer is waiting on merge to complete. So merge is waiting on buffer and buffer is waiting on merge.
eagerClose merge
You could set eagerClose=True when creating your merge. But using eager close may unfortunately result in some children ItemId values never being queried.
Indirect Solution
If you materialize a new stream for each level of the tree then the recursion can be extracted to outside of the stream.
You can construct a query function utilizing the itemFlow:
val itemQuery : Iterable[ItemId] => Future[Seq[Data]] =
(itemIds) => Source.apply(itemIds)
.via(itemFlow)
.runWith(Sink.seq[Data])
This query function can now be wrapped inside of a recursive helper function:
val recQuery : (Iterable[ItemId], Iterable[Data]) => Future[Seq[Data]] =
(itemIds, currentData) => itemQuery(itemIds) flatMap { allNewData =>
val allNewKids = allNewData.flatMap(_.kids).toSet
if(allNewKids.isEmpty)
Future.successful(currentData ++ allNewData)
else
recQuery(allNewKids, currentData ++ data)
}
The number of streams created will be equal to the maximum depth of the tree.
Unfortunately, because Futures are involved, this recursive function is not tail-recursive and could result in a "stack overflow" if the tree is too deep.
I solved this problem by writing my own GraphStage.
import akka.NotUsed
import akka.stream._
import akka.stream.scaladsl._
import akka.stream.stage.{GraphStage, GraphStageLogic, OutHandler}
import scala.concurrent.ExecutionContext
import scala.collection.mutable
import scala.util.{Success, Failure, Try}
import scala.collection.mutable
def unfoldTree[S, E](seeds: List[S],
flow: Flow[S, E, NotUsed],
loop: E => List[S],
bufferSize: Int)(implicit ec: ExecutionContext): Source[E, NotUsed] = {
Source.fromGraph(new UnfoldSource(seeds, flow, loop, bufferSize))
}
object UnfoldSource {
implicit class MutableQueueExtensions[A](private val self: mutable.Queue[A]) extends AnyVal {
def dequeueN(n: Int): List[A] = {
val b = List.newBuilder[A]
var i = 0
while (i < n) {
val e = self.dequeue
b += e
i += 1
}
b.result()
}
}
}
class UnfoldSource[S, E](seeds: List[S],
flow: Flow[S, E, NotUsed],
loop: E => List[S],
bufferSize: Int)(implicit ec: ExecutionContext) extends GraphStage[SourceShape[E]] {
val out: Outlet[E] = Outlet("UnfoldSource.out")
override val shape: SourceShape[E] = SourceShape(out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with OutHandler {
// Nodes to expand
val frontier = mutable.Queue[S]()
frontier ++= seeds
// Nodes expanded
val buffer = mutable.Queue[E]()
// Using the flow to fetch more data
var inFlight = false
// Sink pulled but the buffer was empty
var downstreamWaiting = false
def isBufferFull() = buffer.size >= bufferSize
def fillBuffer(): Unit = {
val batchSize = Math.min(bufferSize - buffer.size, frontier.size)
val batch = frontier.dequeueN(batchSize)
inFlight = true
val toProcess =
Source(batch)
.via(flow)
.runWith(Sink.seq)(materializer)
val callback = getAsyncCallback[Try[Seq[E]]]{
case Failure(ex) => {
fail(out, ex)
}
case Success(es) => {
val got = es.size
inFlight = false
es.foreach{ e =>
buffer += e
frontier ++= loop(e)
}
if (downstreamWaiting && buffer.nonEmpty) {
val e = buffer.dequeue
downstreamWaiting = false
sendOne(e)
} else {
checkCompletion()
}
()
}
}
toProcess.onComplete(callback.invoke)
}
override def preStart(): Unit = {
checkCompletion()
}
def checkCompletion(): Unit = {
if (!inFlight && buffer.isEmpty && frontier.isEmpty) {
completeStage()
}
}
def sendOne(e: E): Unit = {
push(out, e)
checkCompletion()
}
def onPull(): Unit = {
if (buffer.nonEmpty) {
sendOne(buffer.dequeue)
} else {
downstreamWaiting = true
}
if (!isBufferFull && frontier.nonEmpty) {
fillBuffer()
}
}
setHandler(out, this)
}
}
Ah, the joys of cycles in Akka streams. I had a very similar problem which I solved in a deeply hacky way. Possibly it'll be helpful for you.
Hacky Solution:
// add a graph stage that will complete successfully if it sees no element within 5 seconds
val timedStopper = b.add(
Flow[Item]
.idleTimeout(5.seconds)
.recoverWithRetries(1, {
case _: TimeoutException => Source.empty[Item]
}))
source ~> merge ~> fetch ~> timedStopper ~> bcast ~> data
merge <~ buffer <~ kids <~ bcast
What this does is that 5 seconds after the last element passes through the timedStopper stage, that stage completes the stream successfully. This is achieved via the use of idleTimeout, which fails the stream with a TimeoutException, and then using recoverWithRetries to turn that failure into a successful completion. (I did mention it was hacky).
This is obviously not suitable if you might have more than 5 seconds between elements, or if you can't afford a long wait between the stream "actually" completing and Akka picking up on it. Thankfully, neither were a concern for us, and in that case it actually works pretty well!
Non-hacky solution
Unfortunately, the only ways I can think of to do this without cheating via timeouts are very, very complicated.
Basically, you need to be able to track two things:
are there any elements still in the buffer, or in process of being sent to the buffer
is the incoming source open
and complete the stream if and only if the answer to both questions is no. Native Akka building blocks are probably not going to be able to handle this. A custom graph stage might, however. An option might be to write one that takes the place of Merge and give it some way of knowing about the buffer contents, or possibly have it track both the IDs it receives and the IDs the broadcast is sending to the buffer. The problem being that custom graph stages are not particularly pleasant to write at the best of times, let alone when you're mixing logic across stages like this.
Warnings
Akka streams just don't work well with cycles, especially how they calculate completion. As a result, this may not be the only problem you encounter.
For instance, an issue we had with a very similar structure was that a failure in the source was treated as the stream completing successfully, with a succeeded Future being materialised. The problem is that by default, a stage that fails will fail its downstreams but cancel its upstreams (which counts as a successful completion for those stages). With a cycle like the one you have, the result is a race as cancellation propagates down one branch but failure down the other. You also need to check what happens if the sink errors; depending on the cancellation settings for broadcast, it's possible the cancellation will not propagate upwards and the source will happily continue pulling in elements.
One final option: avoid handling the recursive logic with streams at all. On one extreme, if there's any way for you to write a single tail-recursive method that pulls out all the nested items at once and put that into a Flow stage, that will solve your problems. On the other, we're seriously considering going to Kafka queueing for our own system.

Stream Future in Play 2.5

Once again I am attempting to update some pre Play 2.5 code (based on this vid). For example the following used to be how to stream a Future:
Ok.chunked(Enumerator.generateM(Promise.timeout(Some("hello"), 500)))
I have created the following method for the work-around for Promise.timeout (deprecated) using Akka:
private def keepResponding(data: String, delay: FiniteDuration, interval: FiniteDuration): Future[Result] = {
val promise: Promise[Result] = Promise[Result]()
actorSystem.scheduler.schedule(delay, interval) { promise.success(Ok(data)) }
promise.future
}
According to the Play Framework Migration Guide; Enumerators should be rewritten to a Source and Source.unfoldAsync is apparently the equivalent of Enumerator.generateM so I was hoping that this would work (where str is a Future[String]):
def inf = Action { request =>
val str = keepResponding("stream me", 1.second, 2.second)
Ok.chunked(Source.unfoldAsync(str))
}
Of course I'm getting a Type mismatch error and when looking at the case class signature of unfoldAsync:
final class UnfoldAsync[S, E](s: S, f: S ⇒ Future[Option[(S, E)]])
I can see that the parameters are not correct but I'm not fully understanding what/how I should pass this through.
unfoldAsync is even more generic than Play!'s own generateM, as it allows you to pass through a status (S) value. This can make the value emitted depend on the previously emitted value(s).
The example below will load values by an increasing id, until the loading fails:
val source: Source[String, NotUsed] = Source.unfoldAsync(0){ id ⇒
loadFromId(id)
.map(s ⇒ Some((id + 1, s)))
.recover{case _ ⇒ None}
}
def loadFromId(id: Int): Future[String] = ???
In your case an internal state is not really needed, therefore you can just pass dummy values whenever required, e.g.
val source: Source[Result, NotUsed] = Source.unfoldAsync(NotUsed) { _ ⇒
schedule("stream me", 2.seconds).map(x ⇒ Some(NotUsed → x))
}
def schedule(data: String, delay: FiniteDuration): Future[Result] = {
akka.pattern.after(delay, system.scheduler){Future.successful(Ok(data))}
}
Note that your original implementation of keepResponding is incorrect, as you cannot complete a Promise more than once. Akka after pattern offer a simpler way to achieve what you need.
However, note that in your specific case, Akka Streams offers a more idiomatic solution with Source.tick:
val source: Source[String, Cancellable] = Source.tick(1.second, 2.seconds, NotUsed).mapAsync(1){ _ ⇒
loadSomeFuture()
}
def loadSomeFuture(): Future[String] = ???
or even simpler in case you don't actually need asynchronous computation as in your example
val source: Source[String, Cancellable] = Source.tick(1.second, 2.seconds, "stream me")

Scala Future[A] and Future[Option[B]] composition

I have an app that manages Items. When the client queries an item by some info, the app first tries to find an existing item in the db with the info. If there isn't one, the app would
Check if info is valid. This is an expensive operation (much more so than a db lookup), so the app only performs this when there isn't an existing item in the db.
If info is valid, insert a new Item into the db with info.
There are two more classes, ItemDao and ItemService:
object ItemDao {
def findByInfo(info: Info): Future[Option[Item]] = ...
// This DOES NOT validate info; it assumes info is valid
def insertIfNotExists(info: Info): Future[Item] = ...
}
object ItemService {
// Very expensive
def isValidInfo(info: Info): Future[Boolean] = ...
// Ugly
def findByInfo(info: Info): Future[Option[Item]] = {
ItemDao.findByInfo(info) flatMap { maybeItem =>
if (maybeItem.isDefined)
Future.successful(maybeItem)
else
isValidInfo(info) flatMap {
if (_) ItemDao.insertIfNotExists(info) map (Some(_))
else Future.successful(None)
}
}
}
}
The ItemService.findByInfo(info: Info) method is pretty ugly. I've been trying to clean it up for a while, but it's difficult since there are three types involved (Future[Boolean], Future[Item], and Future[Option[Item]]). I've tried to use scalaz's OptionT to clean it up but the non-optional Futures make it not very easy either.
Any ideas on a more elegant implementation?
To expand on my comment.
Since you've already indicated a willingness to go down the route of monad transformers, this should do what you want. There is unfortunately quite a bit of line noise due to Scala's less than stellar typechecking here, but hopefully you find it elegant enough.
import scalaz._
import Scalaz._
object ItemDao {
def findByInfo(info: Info): Future[Option[Item]] = ???
// This DOES NOT validate info; it assumes info is valid
def insertIfNotExists(info: Info): Future[Item] = ???
}
object ItemService {
// Very expensive
def isValidInfo(info: Info): Future[Boolean] = ???
def findByInfo(info: Info): Future[Option[Item]] = {
lazy val nullFuture = OptionT(Future.successful(none[Item]))
lazy val insert = ItemDao.insertIfNotExists(info).liftM[OptionT]
lazy val validation =
isValidInfo(info)
.liftM[OptionT]
.ifM(insert, nullFuture)
val maybeItem = OptionT(ItemDao.findByInfo(info))
val result = maybeItem <+> validation
result.run
}
}
Two comments about the code:
We are using the OptionT monad transformer here to capture the Future[Option[_]] stuff and anything that just lives inside Future[_] we're liftMing up to our OptionT[Future, _] monad.
<+> is an operation provided by MonadPlus. In a nutshell, as the name suggests, MonadPlus captures the intuition that often times monads have an intuitive way of being combined (e.g. List(1, 2, 3) <+> List(4, 5, 6) = List(1, 2, 3, 4, 5, 6)). Here we're using it to short-circuit when findByInfo returns Some(item) rather than the usual behavior to short-circuit on None (this is roughly analogous to List(item) <+> List() = List(item)).
Other small note, if you actually wanted to go down the monad transformers route, often times you end up building everything in your monad transformer (e.g. ItemDao.findByInfo would return an OptionT[Future, Item]) so that you don't have extraneous OptionT.apply calls and then .run everything at the end.
You don't need scalaz for this. Just break your flatMap into two steps:
first, find and validate, then insert if necessary. Something like this:
ItemDao.findByInfo(info).flatMap {
case None => isValidInfo(info).map(None -> _)
case x => Future.successful(x -> true)
}.flatMap {
case (_, true) => ItemDao.insertIfNotExists(info).map(Some(_))
case (x, _) => Future.successful(x)
}
Doesn't look too bad, does it? If you don't mind running validation in parallel with retrieval (marginally more expensive resource-vise, but likely faster on average), you could further simplify it like this:
ItemDao
.findByInfo(info)
.zip(isValidInfo(info))
.flatMap {
case (None, true) => ItemDao.insertIfNotExists(info).map(Some(_))
case (x, _) => x
}
Also, what does insertIfNotExists return if the item does exist? If it returned the existing item, things could be even simpler:
isValidInfo(info)
.filter(identity)
.flatMap { _ => ItemDao.insertIfNotExists(info) }
.map { item => Some(item) }
.recover { case _: NoSuchElementException => None }
If you are comfortable with path-dependent type and higher-kinded type, something like the following can be an elegant solution:
type Const[A] = A
sealed trait Request {
type F[_]
type A
type FA = F[A]
def query(client: Client): Future[FA]
}
case class FindByInfo(info: Info) extends Request {
type F[x] = Option[x]
type A = Item
def query(client: Client): Future[Option[Item]] = ???
}
case class CheckIfValidInfo(info: Info) extends Request {
type F[x] = Const[x]
type A = Boolean
def query(client: Client): Future[Boolean] = ???
}
class DB {
private val dbClient: Client = ???
def exec(request: Request): request.FA = request.query(dbClient)
}
What this does is basically to abstract over both the wrapper type (eg. Option[_]) as well as inner type. For types without a wrapper type, we use Const[_] type which is basically an identity type.
In scala, many problems alike this can be solved elegantly using Algebraic Data Type and its advanced type system (i.e path-dependent type & higher-kinded type). Note that now we have single point of entry exec(request: Request) for executing db requests instead of something like DAO.