Concatinating two Flows in Akka stream - scala

I am trying to concat two Flows and I am not able to explain the output of my implementation.
val source = Source(1 to 10)
val sink = Sink.foreach(println)
val flow1 = Flow[Int].map(s => s + 1)
val flow2 = Flow[Int].map(s => s * 10)
val flowGraph = Flow.fromGraph(
GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val concat = builder.add(Concat[Int](2))
val broadcast = builder.add(Broadcast[Int](2))
broadcast ~> flow1 ~> concat.in(0)
broadcast ~> flow2 ~> concat.in(1)
FlowShape(broadcast.in, concat.out)
}
)
source.via(flowGraph).runWith(sink)
I expect the following output from this code.
2
3
4
.
.
.
11
10
20
.
.
.
100
Instead, I see only "2" being printed. Can you please explain what is wrong in my implmentation and how should I change the program to get the desired output.

From Akka Stream's API docs:
Concat:
Emits when the current stream has an element available; if the current input completes, it tries the next one
Broadcast:
Emits when all of the outputs stops backpressuring and there is an input element available
The two operators won't work in conjunction as there is a conflict in how they work -- Concat tries to pull all elements from one of Broadcast's outputs before switching to the other one, whereas Broadcast won't emit unless there is demand for ALL of its outputs.
For what you need, you could concatenate using concat as suggested by commenters:
source.via(flow1).concat(source.via(flow2)).runWith(sink)
or equivalently, use Source.combine like below:
Source.combine(source.via(flow1), source.via(flow2))(Concat[Int](_)).runWith(sink)

Using GraphDSL, which is a simplified version of the implementation of Source.combine:
val sg = Source.fromGraph(
GraphDSL.create(){ implicit builder =>
import GraphDSL.Implicits._
val concat = builder.add(Concat[Int](2))
source ~> flow1 ~> concat
source ~> flow2 ~> concat
SourceShape(concat.out)
}
)
sg.runWith(sink)

Related

How do you deal with futures and mapAsync in Akka Flow?

I built a akka graph DSL defining a simple flow. But the flow f4 takes 3 seconds to send an element while f2 takes 10 seconds.
As a result, I got : 3, 2, 3, 2. But, this is not what I want. As f2 takes too much time, I would like to get : 3, 3, 2, 2. Here's the code...
implicit val actorSystem = ActorSystem("NumberSystem")
implicit val materializer = ActorMaterializer()
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val in = Source(List(1, 1))
val out = Sink.foreach(println)
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val yourMapper: Int => Future[Int] = (i: Int) => Future(i + 1)
val yourMapper2: Int => Future[Int] = (i: Int) => Future(i + 2)
val f1, f3 = Flow[Int]
val f2= Flow[Int].throttle(1, 10.second, 0, ThrottleMode.Shaping).mapAsync[Int](2)(yourMapper)
val f4= Flow[Int].throttle(1, 3.second, 0, ThrottleMode.Shaping).mapAsync[Int](2)(yourMapper2)
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
ClosedShape
})
g.run()
So where am I going wrong ? With future or mapAsync ? or else ...
Thanks
Sorry I'm new in akka, so I'm still learning. To get the expected results, one way is to put async :
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val in = Source(List(1, 1))
val out = Sink.foreach(println)
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val yourMapper: Int => Future[Int] = (i: Int) => Future(i + 1)
val yourMapper2: Int => Future[Int] = (i: Int) => Future(i + 2)
val f1, f3 = Flow[Int]
val f2= Flow[Int].throttle(1, 10.second, 0, ThrottleMode.Shaping).map(_+1)
//.mapAsyncUnordered[Int](2)(yourMapper)
val f4= Flow[Int].throttle(1, 3.second, 0, ThrottleMode.Shaping).map(_+2)
//.mapAsync[Int](2)(yourMapper2)
in ~> f1 ~> bcast ~> f2.async ~> merge ~> f3 ~> out
bcast ~> f4.async ~> merge
ClosedShape
})
g.run()
As you've already figured out, replacing:
mapAsync(i => Future{i + delta})
with:
map(_ + delta).async
in the two flows would achieve what you want.
The different result boils down to the key difference between mapAsync and map + async. While mapAsync enables execution of Futures in parallel threads, the multiple mapAsync flow stages are still being managed by the same underlying actor which would carry out operator fusion before execution (for cost efficiency in general).
On the other hand, async actually introduces an asynchronous boundary into the stream flow with the individual flow stages handled by separate actors. In your case, each of the two flow stages independently emits elements downstream and whichever element emitted first gets consumed first. Inevitably there is a cost for managing the stream across the asynchronous boundary and Akka Stream uses a windowed buffering strategy to amortize the cost (see this Akka Stream doc).
For more details re: difference between mapAsync and async, this blog post might be of interest.
So you are trying to join together the results coming out of f2 and f4. In which case you're trying to do what is sometimes called "scatter gather pattern".
I don't think there are off the shelf ways to implement it, without adding a custom stateful stage that will keep track of outputs from f2 and from f4 and emit a record when both are available. But they are some complications to bear in mind:
What happens if a f2/f4 fails
What happens if they take too long
You need to have unique key for each input record, so you know which output from f2 correspond to f4 (or vice versa)

How to stop runnable graph

Getting my first steps with akka streams. I have a graph similar to this one copied from here :
val topHeadSink = Sink.head[Int]
val bottomHeadSink = Sink.head[Int]
val sharedDoubler = Flow[Int].map(_ * 2)
val g = RunnableGraph.fromGraph(GraphDSL.create(topHeadSink, bottomHeadSink)((_, _)) { implicit builder =>
(topHS, bottomHS) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.single(1) ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topHS.in
broadcast.out(1) ~> sharedDoubler ~> bottomHS.in
ClosedShape
})
I can run the graph using g.run()
but how can I stop it ?
in what circumstances should I do it (other than the no usage - business wise) ?
This graph is contained within an actor. if the Actor crashes what will happen with the graphs underlying actor ? will it terminate as well ?
As described in the documentation, the way to complete a graph from outside the graph is with KillSwitch. The example that you copied from the documentation is not a good candidate to illustrate this approach, as the source is only a single element, and the stream will complete very quickly when you run it. Let's adjust the graph to more easily see the KillSwitch in action:
val topSink = Sink.foreach(println)
val bottomSink = Sink.foreach(println)
val sharedDoubler = Flow[Int].map(_ * 2)
val killSwitch = KillSwitches.single[Int]
val g = RunnableGraph.fromGraph(GraphDSL.create(topSink, bottomSink, killSwitch)((_, _, _)) {
implicit builder => (topS, bottomS, switch) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.fromIterator(() => (1 to 1000000).iterator) ~> switch ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topS.in
broadcast.out(1) ~> sharedDoubler ~> bottomS.in
ClosedShape
})
val res = g.run // res is of type (Future[Done], Future[Done], UniqueKillSwitch)
Thread.sleep(1000)
res._3.shutdown()
The source now consists of one million elements, and the sinks now print the broadcasted elements. The stream runs for one second, which is not enough time to churn through all one million elements, before we call shutdown to complete the stream.
If you run a stream inside an actor, whether the lifecycle of the underlying actor (or actors) that is created to run the stream is ntied to the lifecycle of the "enclosing" actor depends on how the materializer is created. Read the documentation for more information. The following blog post by Colin Breck about using an actor and KillSwitch to manage the lifecycle of a stream is helpful as well: http://blog.colinbreck.com/integrating-akka-streams-and-akka-actors-part-ii/
There's a KillSwitch feature that should work for you. Check the answer to this other SO question: Proper way to stop Akka Streams on condition

Akka Streams why partition is already connected when not using builder.add?

I'm trying out Akka Stream API and I have no idea why this throws java.lang.IllegalArgumentException: [Partition.in] is already connected in line 5
val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val intSource = Source.fromIterator(() => Iterator.continually(Random.nextInt(10).toString))
val validateInput: Flow[String, Message, NotUsed] = Flow[String].map(Message.fromString)
val validationPartitioner = Partition[Message](2, { // #5 error here
case _: Data => 0
case _ => 1
})
val outputStream = Sink.foreach[Message](println(_))
val errorStream = Sink.ignore
intSource ~> validateInput ~> validationPartitioner.in
validationPartitioner.out(0) ~> outputStream
validationPartitioner.out(1) ~> errorStream
ClosedShape
})
but if I change validationPartitioner to be wrapped in builder.add(...) and remove .in from
intSource ~> validateInput ~> validationPartitioner.in
Everything works. If I just remove .in the code doesn't compile. Why usage of builder is being forced and am I missing something or is it a bug?
All of the components of a graph must be added to the builder, but there are variants of the ~> operator that add the most commonly used components, such as Source and Flow, to the builder under the covers (see here and here). However, junction operations that perform a fan-in (such as Merge) or a fan-out (such as Partition) must be explicitly passed to builder.add if you're using the Graph DSL.

How to get a Subscriber and Publisher from a broadcasted Akka stream?

I'm having problems in getting Publishers and Subscribers out of my flows when using more complicated graphs. My goal is to provide an API of Publishers and Subscribers and run the Akka streaming internally. Here's my first try, which works just fine.
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val flow = subscriberSource.to(someFunctionSink)
//create Reactive Streams Subscriber
val subscriber: Subscriber[Boolean] = flow.run()
//prints true
Source.single(true).to(Sink(subscriber)).run()
But then with a more complicated broadcast graph, I'm unsure as how to get the Subscriber and Publisher objects out? Do I need a partial graph?
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val publisherSink = Sink.publisher[Boolean]
FlowGraph.closed() { implicit builder =>
import FlowGraph.Implicits._
val broadcast = builder.add(Broadcast[Boolean](2))
subscriberSource ~> broadcast.in
broadcast.out(0) ~> someFunctionSink
broadcast.out(1) ~> publisherSink
}.run()
val subscriber: Subscriber[Boolean] = ???
val publisher: Publisher[Boolean] = ???
When you call RunnableGraph.run() the stream is run and the result is the "materialized value" for that run.
In your simple example the materialized value of Source.subscriber[Boolean] is Subscriber[Boolean]. In your complex example you want to combine materialized values of several components of your graph to a materialized value that is a tuple (Subscriber[Boolean], Publisher[Boolean]).
You can do that by passing the components for which you are interested in their materialized values to FlowGraph.closed() and then specify a function to combine the materialized values:
import akka.stream.scaladsl._
import org.reactivestreams._
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val publisherSink = Sink.publisher[Boolean]
val graph =
FlowGraph.closed(subscriberSource, publisherSink)(Keep.both) { implicit builder ⇒
(in, out) ⇒
import FlowGraph.Implicits._
val broadcast = builder.add(Broadcast[Boolean](2))
in ~> broadcast.in
broadcast.out(0) ~> someFunctionSink
broadcast.out(1) ~> out
}
val (subscriber: Subscriber[Boolean], publisher: Publisher[Boolean]) = graph.run()
See the Scaladocs for more information about the overloads of FlowGraph.closed.
(Keep.both is short for a function (a, b) => (a, b))

How can I pipe the output of an Akka Streams Merge to another Flow?

I'm playing with Akka Streams, and have figured out most of the basics, but I'm not clear on how to take the results of a Merge and do further operations (map, filter, fold, etc) on it.
I'd like to modify the following code so that, instead of piping the merge to a sink, that I could instead manipulate the data further.
implicit val materializer = FlowMaterializer()
val items_a = Source(List(10,20,30,40,50))
val items_b = Source(List(60,70,80,90,100))
val sink = ForeachSink(println)
val materialized = FlowGraph { implicit builder =>
import FlowGraphImplicits._
val merge = Merge[Int]("m1")
items_a ~> merge
items_b ~> merge ~> sink
}.run()
I guess my primary problem is that I can't figure out how to make a flow component that doesn't have a source, and I can't figure out how to do a merge without using the special Merge object and the ~> syntax.
EDIT: This question and answer was for and worked with Akka Streams 0.11
If you don't care about the semantic of Merge where elements go downstream randomly then you could just try concat on Source instead like so:
items_a.concat(items_b).map(_ * 2).map(_.toString).foreach(println)
The difference here is that all items from a will flow downstream first before any elements of b. If you really need the behavior of Merge, then you could consider something like the following (keep in mind that you will eventually need a sink, but you certainly can do additional transforms after the merging):
val items_a = Source(List(10,20,30,40,50))
val items_b = Source(List(60,70,80,90,100))
val sink = ForeachSink[Double](println)
val transform = Flow[Int].map(_ * 2).map(_.toDouble).to(sink)
val materialized = FlowGraph { implicit builder =>
import FlowGraphImplicits._
val merge = Merge[Int]("m1")
items_a ~> merge
items_b ~> merge ~> transform
}.run
In this example, you can see that I use the helper from the Flow companion to create a Flow without a specific input Source. From there I can then attach that to the merge point to get my additional processing.
Use Source.combine:
val items_a :: items_b :: items_c = List(
Source(List(10,20,30,40,50)),
Source(List(60,70,80,90,100),
Source(List(110,120,130,140,1500))
Source.combine(items_a, items_b, items_c : _*)(Merge(_))
.map(_+1)
.runForeach(println)
Or, if you need to preserve the order of the input-sources (e.g. items_a must before items_b and items_b must before items_c) you can use Concat, instead of Merge.
val items_a :: items_b :: items_c = List(
Source(List(10,20,30,40,50)),
Source(List(60,70,80,90,100),
Source(List(110,120,130,140,1500))
Source.combine(items_a, items_b, items_c : _*)(Concat(_))