Akka Streams why partition is already connected when not using builder.add? - scala

I'm trying out Akka Stream API and I have no idea why this throws java.lang.IllegalArgumentException: [Partition.in] is already connected in line 5
val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val intSource = Source.fromIterator(() => Iterator.continually(Random.nextInt(10).toString))
val validateInput: Flow[String, Message, NotUsed] = Flow[String].map(Message.fromString)
val validationPartitioner = Partition[Message](2, { // #5 error here
case _: Data => 0
case _ => 1
})
val outputStream = Sink.foreach[Message](println(_))
val errorStream = Sink.ignore
intSource ~> validateInput ~> validationPartitioner.in
validationPartitioner.out(0) ~> outputStream
validationPartitioner.out(1) ~> errorStream
ClosedShape
})
but if I change validationPartitioner to be wrapped in builder.add(...) and remove .in from
intSource ~> validateInput ~> validationPartitioner.in
Everything works. If I just remove .in the code doesn't compile. Why usage of builder is being forced and am I missing something or is it a bug?

All of the components of a graph must be added to the builder, but there are variants of the ~> operator that add the most commonly used components, such as Source and Flow, to the builder under the covers (see here and here). However, junction operations that perform a fan-in (such as Merge) or a fan-out (such as Partition) must be explicitly passed to builder.add if you're using the Graph DSL.

Related

How to stop runnable graph

Getting my first steps with akka streams. I have a graph similar to this one copied from here :
val topHeadSink = Sink.head[Int]
val bottomHeadSink = Sink.head[Int]
val sharedDoubler = Flow[Int].map(_ * 2)
val g = RunnableGraph.fromGraph(GraphDSL.create(topHeadSink, bottomHeadSink)((_, _)) { implicit builder =>
(topHS, bottomHS) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.single(1) ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topHS.in
broadcast.out(1) ~> sharedDoubler ~> bottomHS.in
ClosedShape
})
I can run the graph using g.run()
but how can I stop it ?
in what circumstances should I do it (other than the no usage - business wise) ?
This graph is contained within an actor. if the Actor crashes what will happen with the graphs underlying actor ? will it terminate as well ?
As described in the documentation, the way to complete a graph from outside the graph is with KillSwitch. The example that you copied from the documentation is not a good candidate to illustrate this approach, as the source is only a single element, and the stream will complete very quickly when you run it. Let's adjust the graph to more easily see the KillSwitch in action:
val topSink = Sink.foreach(println)
val bottomSink = Sink.foreach(println)
val sharedDoubler = Flow[Int].map(_ * 2)
val killSwitch = KillSwitches.single[Int]
val g = RunnableGraph.fromGraph(GraphDSL.create(topSink, bottomSink, killSwitch)((_, _, _)) {
implicit builder => (topS, bottomS, switch) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.fromIterator(() => (1 to 1000000).iterator) ~> switch ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topS.in
broadcast.out(1) ~> sharedDoubler ~> bottomS.in
ClosedShape
})
val res = g.run // res is of type (Future[Done], Future[Done], UniqueKillSwitch)
Thread.sleep(1000)
res._3.shutdown()
The source now consists of one million elements, and the sinks now print the broadcasted elements. The stream runs for one second, which is not enough time to churn through all one million elements, before we call shutdown to complete the stream.
If you run a stream inside an actor, whether the lifecycle of the underlying actor (or actors) that is created to run the stream is ntied to the lifecycle of the "enclosing" actor depends on how the materializer is created. Read the documentation for more information. The following blog post by Colin Breck about using an actor and KillSwitch to manage the lifecycle of a stream is helpful as well: http://blog.colinbreck.com/integrating-akka-streams-and-akka-actors-part-ii/
There's a KillSwitch feature that should work for you. Check the answer to this other SO question: Proper way to stop Akka Streams on condition

Merge and broadcast, building a (simple) Akka graph

The Akka documentation is vast and there are a lot of tutorials. But either they are outdated or they only cover the basics (or, maybe I simply can't find the right ones).
What I want to create is a websocket application with multiple clients and multiple sources on the server side. As I don't want to get over my head from the start, I want to make baby steps and incrementally increase the complexity of the software I am building.
After toying around with some simple flows I wanted to start with a more sophisticated graph now.
What I want is:
Two sources, one that pushes "keepAlive" messages from the server to the client (currently only one) and a second one that actually pushes useful data.
Now for the first one I have this:
val tickingSource: Source[Array[Byte], Cancellable] =
Source.tick(initialDelay = 1 second, interval = 10 seconds, tick = NotUsed)
.zipWithIndex
.map{ case (_, counter) => SomeMessage().toByteArray}
Where SomeMessage is a protobuf type.
Because I can't find an up-to-date way to add an actor as a source, I tried the following for my second source:
val secondSource = Source(1 to 1000)
val secondSourceConverter = Flow[Int].map(x => BigInteger.valueOf(x).toByteArray)
My attempt at the graph:
val g: RunnableGraph[NotUsed] = RunnableGraph.fromGraph(GraphDSL.create()
{
implicit builder =>
import GraphDSL.Implicits._
val sourceMerge = builder.add(Merge[Array[Byte]](2).named("sourceMerge"))
val x = Source(1 to 1000)
val y = Flow[Int].map(x => BigInteger.valueOf(x).toByteArray)
val out = Sink.ignore
tickingSource ~> sourceMerge ~> out
x ~> y ~> sourceMerge
ClosedShape
})
Now g is of type RunnableGraph[NotUsed] while it should be RunnableGraph[Array[Byte]] for my websocket. And I wonder here: am I already doing something completely wrong?
You need to pass the secondSourceConverter into the GraphDSL.create, like the following example taken from their docs. Here they import 2 sinks, but it's the same technique.
RunnableGraph.fromGraph(GraphDSL.create(topHeadSink, bottomHeadSink)((_, _)) { implicit builder =>
(topHS, bottomHS) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.single(1) ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topHS.in
broadcast.out(1) ~> sharedDoubler ~> bottomHS.in
ClosedShape
})
Your graph is of type RunnableGraph[NotUsed] because you're using Sink.ignore. And you probably want a RunnableGraph[Future[Array[Byte]]] instead of a RunnableGraph[Array[Byte]]:
val byteSink = Sink.fold[Array[Byte], Array[Byte]](Array[Byte]())(_ ++ _)
val g = RunnableGraph.fromGraph(GraphDSL.create(byteSink) { implicit builder => bSink =>
import GraphDSL.Implicits._
val sourceMerge = builder.add(Merge[Array[Byte]](2))
tickingSource ~> sourceMerge ~> bSink.in
secondSource ~> secondSourceConverter ~> sourceMerge
ClosedShape
})
// RunnableGraph[Future[Array[Byte]]]
I'm not sure how you would like to process incoming messages but here is a simple example. Hope that it'll help you.
path("ws") {
extractUpgradeToWebSocket { upgrade =>
complete {
import scala.concurrent.duration._
val tickSource = Source.tick(1.second, 1.second, TextMessage("ping"))
val messagesSource = Source.queue(10, OverflowStrategy.backpressure)
messagesSource.mapMaterializedValue { queue =>
//do something with out queue
//like myHandler ! RegisterOutQueue(queue)
}
val sink = Sink.ignore
val source = tickSource.merge(messagesSource)
upgrade.handleMessagesWithSinkSource(
inSink = sink,
outSource = source
)
}
}

Akka Streams: How do I get Materialized Sink output from GraphDSL API?

This is a really simple, newbie question using the GraphDSL API. I read several related SO threads and I don't see the answer:
val actorSystem = ActorSystem("QuickStart")
val executor = actorSystem.dispatcher
val materializer = ActorMaterializer()(actorSystem)
val source: Source[Int, NotUsed] = Source(1 to 5)
val throttledSource = source.throttle(1, 1.second, 1, ThrottleMode.shaping)
val intDoublerFlow = Flow.fromFunction[Int, Int](i => i * 2)
val sink = Sink.foreach(println)
val graphModel = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
// I presume I want to change this shape to something else
// but I can't figure out what it is.
ClosedShape
}
// TODO: This is RunnableGraph[NotUsed], I want RunnableGraph[Future[Done]] that gives the
// materialized Future[Done] from the sink. I presume I need to use a GraphDSL SourceShape
// but I can't get that working.
val graph = RunnableGraph.fromGraph(graphModel)
// This works and gives me the materialized sink output using the simpler API.
// But I want to use the GraphDSL so that I can add branches or junctures.
val graphThatIWantFromDslAPI = throttledSource.toMat(sink)(Keep.right)
The trick is to pass the stage you want the materialized value of (in your case, sink) to the GraphDSL.create. The function you pass as a second parameter changes as well, needing a Shape input parameter (s in the example below) which you can use in your graph.
val graphModel: Graph[ClosedShape, Future[Done]] = GraphDSL.create(sink) { implicit b => s =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> s
// ClosedShape is just fine - it is always the shape of a RunnableGraph
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
More info can be found in the docs.
val graphModel = GraphDSL.create(sink) { implicit b: Builder[Future[Done]] => sink =>
import akka.stream.scaladsl.GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
val graphThatIWantFromDslAPI: RunnableGraph[Future[Done]] = throttledSource.toMat(sink)(Keep.right)
The problem with the GraphDSL API is, that the implicit Builder is heavily overloaded. You need to wrap your sink in create, which turns the Builder[NotUsed] into Builder[Future[Done]] and represents now a function from builder => sink => shape instead of builder => shape.

How do you deal with futures in Akka Flow?

I have built an akka graph that defines a flow. My objective is to reformat my future response and save it to a file. The flow can be outlined bellow:
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val balancer = builder.add(Balance[(HttpRequest, String)](6, waitForAllDownstreams = false))
val merger = builder.add(Merge[Future[Map[String, String]]](6))
val fileSink = FileIO.toPath(outputPath, options)
val ignoreSink = Sink.ignore
val in = Source(seeds)
in ~> balancer.in
for (i <- Range(0,6)) {
balancer.out(i) ~>
wikiFlow.async ~>
// This maps to a Future[Map[String, String]]
Flow[(Try[HttpResponse], String)].map(parseHtml) ~>
merger
}
merger.out ~>
// When we merge we need to map our Map to a file
Flow[Future[Map[String, String]]].map((d) => {
// What is the proper way of serializing future map
// so I can work with it like a normal stream into fileSink?
// I could manually do ->
// d.foreach(someWriteToFileProcess(_))
// with ignoreSink, but this defeats the nice
// akka flow
}) ~>
fileSink
ClosedShape
})
I can hack this workflow to write my future map to a file via foreach, but I'm afraid this could somehow lead to concurrency issues with FileIO and it just doesn't feel right. What is the proper way to handle futures with our akka flow?
The easiest way to create a Flow which involves an asynchronous computation is by using mapAsync.
So... lets say you want to create a Flow which consumes Int and produces String using an asynchronous computation mapper: Int => Future[String] with a parallelism of 5.
val mapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](5)(mapper)
Now, you can use this flow in your graph however you want.
An example usage will be,
val graph = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val intSource = Source(1 to 10)
val printSink = Sink.foreach[String](s => println(s))
val yourMapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](2)(yourMapper)
intSource ~> yourFlow ~> printSink
ClosedShape
}

How to assemble an Akka Streams sink from multiple file writes?

I'm trying to integrate an akka streams based flow in to my Play 2.5 app. The idea is that you can stream in a photo, then have it written to disk as the raw file, a thumbnailed version and a watermarked version.
I managed to get this working using a graph something like this:
val byteAccumulator = Flow[ByteString].fold(new ByteStringBuilder())((builder, b) => {builder ++= b.toArray})
.map(_.result().toArray)
def toByteArray = Flow[ByteString].map(b => b.toArray)
val graph = Flow.fromGraph(GraphDSL.create() {implicit builder =>
import GraphDSL.Implicits._
val streamFan = builder.add(Broadcast[ByteString](3))
val byteArrayFan = builder.add(Broadcast[Array[Byte]](2))
val output = builder.add(Flow[ByteString].map(x => Success(Done)))
val rawFileSink = FileIO.toFile(file)
val thumbnailFileSink = FileIO.toFile(getFile(path, Thumbnail))
val watermarkedFileSink = FileIO.toFile(getFile(path, Watermarked))
streamFan.out(0) ~> rawFileSink
streamFan.out(1) ~> byteAccumulator ~> byteArrayFan.in
streamFan.out(2) ~> output.in
byteArrayFan.out(0) ~> slowThumbnailProcessing ~> thumbnailFileSink
byteArrayFan.out(1) ~> slowWatermarkProcessing ~> watermarkedFileSink
FlowShape(streamFan.in, output.out)
})
graph
}
Then I wire it in to my play controller using an accumulator like this:
val sink = Sink.head[Try[Done]]
val photoStorageParser = BodyParser { req =>
Accumulator(sink).through(graph).map(Right.apply)
}
The problem is that my two processed file sinks aren't completing and I'm getting zero sizes for both processed files, but not the raw one. My theory is that the accumulator is only waiting on one of the outputs of my fan out, so when the input stream completes and my byteAccumulator spits out the complete file, by the time the processing is finished play has got the materialized value from the output.
So, my questions are:
Am I on the right track with this as far as my approach goes?
What is the expected behaviour for running a graph like this?
How can I bring all my sinks together to form one final sink?
Ok, after a little help (Andreas was on the right track), I've arrived at this solution which does the trick:
val rawFileSink = FileIO.toFile(file)
val thumbnailFileSink = FileIO.toFile(getFile(path, Thumbnail))
val watermarkedFileSink = FileIO.toFile(getFile(path, Watermarked))
val graph = Sink.fromGraph(GraphDSL.create(rawFileSink, thumbnailFileSink, watermarkedFileSink)((_, _, _)) {
implicit builder => (rawSink, thumbSink, waterSink) => {
val streamFan = builder.add(Broadcast[ByteString](2))
val byteArrayFan = builder.add(Broadcast[Array[Byte]](2))
streamFan.out(0) ~> rawSink
streamFan.out(1) ~> byteAccumulator ~> byteArrayFan.in
byteArrayFan.out(0) ~> processorFlow(Thumbnail) ~> thumbSink
byteArrayFan.out(1) ~> processorFlow(Watermarked) ~> waterSink
SinkShape(streamFan.in)
}
})
graph.mapMaterializedValue[Future[Try[Done]]](fs => Future.sequence(Seq(fs._1, fs._2, fs._3)).map(f => Success(Done)))
After which it's dead easy to call this from Play:
val photoStorageParser = BodyParser { req =>
Accumulator(theSink).map(Right.apply)
}
def createImage(path: String) = Action(photoStorageParser) { req =>
Created
}