I have a flow graph with broadcast and zip inside. If something (regardless what is it) fails inside this flow, I'd like to drop the problematic element passed to it and resume. I came up with the following solution:
val flow = Flow.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val dangerousFlow = Flow[Int].map {
case 5 => throw new RuntimeException("BOOM!")
case x => x
}
val safeFlow = Flow[Int]
val bcast = builder.add(Broadcast[Int](2))
val zip = builder.add(Zip[Int, Int])
bcast ~> dangerousFlow ~> zip.in0
bcast ~> safeFlow ~> zip.in1
FlowShape(bcast.in, zip.out)
})
Source(1 to 9)
.via(flow)
.withAttributes(ActorAttributes.supervisionStrategy(Supervision.restartingDecider))
.runWith(Sink.foreach(println))
I'd expect it to print:
(1,1)
(2,2)
(3,3)
(4,4)
(5,5)
(6,6)
(7,7)
(8,8)
(9,9)
However, it deadlocks, printing only:
(1,1)
(2,2)
(3,3)
(4,4)
We've done some debugging, and it turns out it applied the "resume" strategy to its children, which caused dangerousFlow to resume after failure and thus to demand an element from bcast. bcast won't emit an element until safeFlow demands another element, which actually never happens (because it's waiting for demand from zip).
Is there a way to resume the graph regardless of what went wrong inside one of the stages?
I think you understood the problem well. You saw that, when your element 5 crashes dangerousFlow, you should also stop the element 5 that is going through safeFlow because if it reaches the zip stage, you have the problem you describe. I don't know how to solve your problem between the broadcast and zip stages, but what about pushing the problem further, where it is easier to handle?
Consider using the following dangerousFlow:
import scala.util._
val dangerousFlow = Flow[Int].map {
case 5 => Failure(new RuntimeException("BOOM!"))
case x => Success(x)
}
Even in case of problem, dangerousFlow would still emit data. You can then zip as you are currently doing and would just need to add a collect stage as last step of your graph. On a flow, this would look like:
Flow[(Try[Int], Int)].collect {
case (Success(s), i) => s -> i
}
Now if, as you wrote, you really expect it to output the (5, 5) tuple, use the following:
Flow[(Try[Int], Int)].collect {
case (Success(s), i) => s -> i
case (_, i) => i -> i
}
Related
I have a use case where I want to send a message to an external system but the flow that sends this message takes and returns a type I cant use downstream. This is a great use case for the pass through flow. I am using the implementation here. Initially I was worried that if the processingFlow uses a mapAsyncUnordered then this flow wouldn't work. Since the processing flow may reorder messages and the zip with may push out a tuple with the incorrect pair. E.g In the following example.
val testSource = Source(1 until 50)
val processingFlow: Flow[Int, Int, NotUsed] = Flow[Int].mapAsyncUnordered(10)(x => Future {
Thread.sleep(Random.nextInt(50))
x * 10
})
val passThroughFlow = PassThroughFlow(processingFlow, Keep.both)
val future = testSource.via(passThroughFlow).runWith(Sink.seq)
I would expect that the processing flow could reorder its outputs with respect its input and i would get a result such as:
[(30,1), (40,2),(10,3),(10,4), ...]
With the right ( the passed through always being in order) but the left which goes through my mapAsyncUnordered potentially being joined with an incorrect element to make a bad tuple.
Instead i actually get:
[(10,1), (20,2),(30,3),(40,4), ...]
Every time. Upon further investigation I noticed the code was running slow and in fact its not running in parallel at all despite my map async unordered. I tried introducing a buffer before and after as well as an async boundary but it always seems to run sequentially. This explains why it always ordered but I want my processing flow to have a higher throughput.
I came up with the following work around:
object PassThroughFlow {
def keepRight[A, A1](processingFlow: Flow[A, A1, NotUsed]): Flow[A, A, NotUsed] =
keepBoth[A, A1](processingFlow).map(_._2)
def keepBoth[A, A1](processingFlow: Flow[A, A1, NotUsed]): Flow[A, (A1, A), NotUsed] =
Flow.fromGraph(GraphDSL.create() { implicit builder => {
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[A](2))
val zip = builder.add(ZipWith[A1, A, (A1, A)]((left, right) => (left, right)))
broadcast.out(0) ~> processingFlow ~> zip.in0
broadcast.out(1) ~> zip.in1
FlowShape(broadcast.in, zip.out)
}
})
}
object ParallelPassThroughFlow {
def keepRight[A, A1](parallelism: Int, processingFlow: Flow[A, A1, NotUsed]): Flow[A, A, NotUsed] =
keepBoth(parallelism, processingFlow).map(_._2)
def keepBoth[A, A1](parallelism: Int, processingFlow: Flow[A, A1, NotUsed]): Flow[A, (A1, A), NotUsed] = {
Flow.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val fanOut = builder.add(Balance[A](outputPorts = parallelism))
val merger = builder.add(Merge[(A1, A)](inputPorts = parallelism, eagerComplete = false))
Range(0, parallelism).foreach { n =>
val passThrough = PassThroughFlow.keepBoth(processingFlow)
fanOut.out(n) ~> passThrough ~> merger.in(n)
}
FlowShape(fanOut.in, merger.out)
})
}
}
Two questions:
In the original implementation, Why does the zip inside the pass through flow limit the amount of parallelism of the map async unordered?
Is my work around sound or could it be improved? I basically fan out my input the input to multiple stacks of the pass through flow and merge it all back together. It seems to have the properties that I want (parallel yet maintains order even if processing flow reorders) yet something doesn't feel right
The behavior you're witnessing is a result of how broadcast and zip work: broadcast emits downstream when all of its outputs signal demand; zip waits for all of its inputs before signaling demand (and emitting downstream).
broadcast.out(0) ~> processingFlow ~> zip.in0
broadcast.out(1) ~> zip.in1
Consider the movement of the first element (1) through the above graph. 1 is broadcast to both processingFlow and zip. zip immediately receives one of its inputs (1) and waits for its other input (10), which will take a little longer to arrive. Only when zip gets both 1 and 10 does it pull for more elements from upstream, thus triggering the movement of the second element (2) through the stream. And so on.
As for your ParallelPassThroughFlow, I don't know why "something doesn't feel right" to you.
I have RunnableGraph like following. When there is simple map between broadcast and merge stages everything is fine. However, when it comes to mapConcat, this code is not working after consuming the first element.
I want to know why it doesn't work.
RunnableGraph.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val M = b.add(MergePreferred[Int](1))
val B = b.add(Broadcast[Int](2))
val S = Source(List(3))
S ~> M ~> Flow[Int].map { s => println(s); s } ~> B ~> Sink.ignore
M.preferred <~ Flow[Int].map(x => List.fill(3)(x-1)).mapConcat(x => {println(x); x}).filter(_ > 0) <~ B
ClosedShape
})
// run() output:
// 3
// List(2,2,2)
The mapConcat stage blocks the feedback loop, and that is expected. Consider the following chain of events:
the mapConcat function prints List(2,2,2)
the mapConcat stage needs demand to emit the first of the 3 available elements (2, 2, 2)
the demand has to come from the Merge stage, and therefore from the Broadcast stage.
the Broadcast stage backpressures if any of its downstreams backpressures. It's downstreams are a Sink.ignore (that never backpressures), and the mapConcat itself.
the mapConcat backpressures if "there are still remaining elements from the previously calculated collection", as per the docs. This is indeed the case.
In other words, your cycle is unbalanced. You are introducing more elements in the feedback loop than you are removing.
This issue is explained in detail in this documentation page, where a couple of solutions are also presented. For your specific case, because of the filter stage you have, introducing a buffer larger than 13 would print all the elements. However, note that the graph will just hang and not complete afterwards.
S ~> M ~> Flow[Int].map { s => println(s); s } ~> B ~> Sink.ignore
M.preferred <~ Flow[Int].buffer(20, OverflowStrategy.dropHead) <~ Flow[Int].map(x => List.fill(3)(x-1)).mapConcat(x => {println(x); x}).filter(_ > 0) <~ B
I am looking for a way to easily reuse akka-stream flows.
I treat the Flow I intend to reuse as a function, so I would like to keep its signature like:
Flow[Input, Output, NotUsed]
Now when I use this flow I would like to be able to 'call' this flow and keep the result aside for further processing.
So I want to start with Flow emiting [Input], apply my flow, and proceed with Flow emitting [(Input, Output)].
example:
val s: Source[Int, NotUsed] = Source(1 to 10)
val stringIfEven = Flow[Int].filter(_ % 2 == 0).map(_.toString)
val via: Source[(Int, String), NotUsed] = ???
Now this is not possible in a straightforward way because combining flow with .via() would give me Flow emitting just [Output]
val via: Source[String, NotUsed] = s.via(stringIfEven)
Alternative is to make my reusable flow emit [(Input, Output)] but this requires every flow to push its input through all the stages and make my code look bad.
So I came up with a combiner like this:
def tupledFlow[In,Out](flow: Flow[In, Out, _]):Flow[In, (In,Out), NotUsed] = {
Flow.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val broadcast = b.add(Broadcast[In](2))
val zip = b.add(Zip[In, Out])
broadcast.out(0) ~> zip.in0
broadcast.out(1) ~> flow ~> zip.in1
FlowShape(broadcast.in, zip.out)
})
}
that is broadcasting the input to the flow and as well in a parallel line directly -> both to the 'Zip' stage where I join values into a tuple. It then can be elegantly applied:
val tupled: Source[(Int, String), NotUsed] = s.via(tupledFlow(stringIfEven))
Everything great but when given flow is doing a 'filter' operation - this combiner is stuck and stops processing further events.
I guess that is due to 'Zip' behaviour that requires all subflows to do the same - in my case one branch is passing given object directly so another subflow cannot ignore this element with. filter(), and since it does - the flow stops because Zip is waiting for push.
Is there a better way to achieve flow composition?
Is there anything I can do in my tupledFlow to get desired behaviour when 'flow' ignores elements with 'filter' ?
Two possible approaches - with debatable elegance - are:
1) avoid using filtering stages, mutating your filter into a Flow[Int, Option[Int], NotUsed]. This way you can apply your zipping wrapper around your whole graph, as was your original plan. However, the code looks more tainted, and there is added overhead by passing around Nones.
val stringIfEvenOrNone = Flow[Int].map{
case x if x % 2 == 0 => Some(x.toString)
case _ => None
}
val tupled: Source[(Int, String), NotUsed] = s.via(tupledFlow(stringIfEvenOrNone)).collect{
case (num, Some(str)) => (num,str)
}
2) separate the filtering and transforming stages, and apply the filtering ones before your zipping wrapper. Probably a more lightweight and better compromise.
val filterEven = Flow[Int].filter(_ % 2 == 0)
val toString = Flow[Int].map(_.toString)
val tupled: Source[(Int, String), NotUsed] = s.via(filterEven).via(tupledFlow(toString))
EDIT
3) Posting another solution here for clarity, as per the discussions in the comments.
This flow wrapper allows to emit each element from a given flow, paired with the original input element that generated it. It works for any kind of inner flow (emitting 0, 1 or more elements for each input).
def tupledFlow[In,Out](flow: Flow[In, Out, _]): Flow[In, (In,Out), NotUsed] =
Flow[In].flatMapConcat(in => Source.single(in).via(flow).map( out => in -> out))
I came up with an implementation of TupledFlow that works when wrapped Flow uses filter() or mapAsync() and when wrapped Flow emits 0,1 or N elements for every input:
def tupledFlow[In,Out](flow: Flow[In, Out, _])(implicit materializer: Materializer, executionContext: ExecutionContext):Flow[In, (In,Out), NotUsed] = {
val v:Flow[In, Seq[(In, Out)], NotUsed] = Flow[In].mapAsync(4) { in: In =>
val outFuture: Future[Seq[Out]] = Source.single(in).via(flow).runWith(Sink.seq)
val bothFuture: Future[Seq[(In,Out)]] = outFuture.map( seqOfOut => seqOfOut.map((in,_)) )
bothFuture
}
val onlyDefined: Flow[In, (In, Out), NotUsed] = v.mapConcat[(In, Out)](seq => seq.to[scala.collection.immutable.Iterable])
onlyDefined
}
the only drawback I see here is that I am instantiating and materializing a flow for a single entity - just to get a notion of 'calling a flow as a function'.
I didn't do any performance tests on that - however since heavy-lifting is done in a wrapped Flow which is executed in a future - I believe this will be ok.
This implementation passes all the tests from https://gist.github.com/kretes/8d5f2925de55b2a274148b69f79e55ac#file-tupledflowspec-scala
Imagine a
val myFlow: Flow[Element] = ... //some flow..
Given a weight function
val weightFunction: Element => Int
I would like to obtain a
val transformedFlow: Flow[List[Element]]
such that each element of the transformedFlow is a List[Element], such that the sum of the weights of the elements in that list is greater than a given constant.
How would I achieve that?
How about using scan to create a stream of accumulated weights, then zip the results with the original stream of elements and then use splitAfter to create substreams? I have not even tried to compile the following, but I hope you get the idea:
val broadCast = builder.add(Broadcast[Element](2))
val zip = builder.add(Zip[Element, Boolean])
myFlow.shape.out ~> broadCast.in
broadCast.out(0) ~> zip.in0
broadCast.out(1).scan(0){ (totalWeight, elem) =>
if(totalWeight > Limit) weightFunction(elem)
else totalWeight + weightFunction(elem)
}.map(_ > Limit) ~> zip.in1
val resultFlow =
zip.out.splitAfter(_._2)
.fold(List.empty[Element]){ case (list, (elem, _)) => elem :: list }
.concatSubstreams
(You might want to consider doing map(_.reverse) on the resultFlow.)
Edit: you don't even need to do the broadcast and zip if you change the return type of the scan a bit - see a runnable code example here: https://gist.github.com/MartinHH/a05a87269b1697d5f57a1c77db269767
Let's assume I want to create a Flow, which takes Ints and outputs tuples (doubled int, sum). So I fan-out ints, map on one edge and scan on the other. Then I zip them and this is the result:
object Main extends App {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val flow = Flow.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val broadcast = b.add(Broadcast[Int](2))
val zip = b.add(Zip[Int, Int])
val flowMap = b.add(Flow[Int].map(_ * 2))
val flowScan = b.add(Flow[Int].scan(0)(_ + _))
broadcast.out(0) ~> flowMap ~> zip.in0
broadcast.out(1) ~> flowScan ~> zip.in1
FlowShape(broadcast.in, zip.out)
})
Source(1 to 5).via(flow).to(Sink.foreach(println)).run()
}
Unfortunately, this doesn't output anything. I researched it a bit and found out that:
Broadcast emits when all of the outputs stop backpressuring and there is an input element available,
Scan backpressures when downstream backpressures.
This makes the whole flow deadlock and nothing happens. Does somebody know how to achieve the result:
(2,0)
(4,1)
(6,3)
(8,6)
(10,10)
in a nice way? The only solution I have found so far is to use .buffer:
val flowScan = b.add(Flow[Int].buffer(1, OverflowStrategy.backpressure).scan(0)(_ + _))
But I don't really like this solution because it is describing not only the logic, but also some technicalities...
The reason of the deadlock is that scan will upon its first demand, emit the start value, so 0 in this case and not pass demand upstream, this means that demand only reaches broadcast.out(0) and as you said, broadcast only emits when there has been demand from all the downstreams.
The buffer might seem like a technicality, but it is actually expressing the graph according to what you want to achieve, that you want to zip the two branches, but the scan-one will always be one element ahead of the other. This is very central to how akka-streams works.
So your result is not actually something that matches what broadcast+zip does without some additional graph nodes, I think that the way to most cleanly express what you want to happen is to place the buffer separately before the scan, this makes it more clear that one branch will be ahead of the other:
broadcast.out(0) ~> flowMap ~> zip.in0
broadcast.out(1) ~> buffer ~> flowScan ~> zip.in1