Akka Streams: No-arg GraphDSL.create() vs GraphDSL.create(sink) - scala

The doc has the following example (only what's relevant to my question is shown):
val resultSink = Sink.head[Int]
val g = RunnableGraph.fromGraph(GraphDSL.create(resultSink) { implicit b => sink =>
import GraphDSL.Implicits._
// importing the partial graph will return its shape (inlets & outlets)
val pm3 = b.add(pickMaxOfThree)
Source.single(1) ~> pm3.in(0)
Source.single(2) ~> pm3.in(1)
Source.single(3) ~> pm3.in(2)
pm3.out ~> sink.in
ClosedShape
})
I was curious about why the sink has to be passed in as a parameter to GraphDSL.create so I modified the example slightly
val resultSink = Sink.head[Int]
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
// importing the partial graph will return its shape (inlets & outlets)
val pm3 = b.add(pickMaxOfThree)
val s = b.add(resultSink).in
Source.single(1) ~> pm3.in(0)
Source.single(2) ~> pm3.in(1)
Source.single(3) ~> pm3.in(2)
pm3.out ~> s
ClosedShape
})
However, this changes the return type of g.run() from Future[Int] to akka.NotUsed. Why?

I think I found the answer myself. Acc. to the doc:
using builder.add(...), an operation that will make a copy of the
blueprint that is passed to it and return the inlets and outlets of
the resulting copy so that they can be wired up. Another alternative
is to pass existing graphs—of any shape—into the factory method that
produces a new graph. The difference between these approaches is that
importing using builder.add(...) ignores the materialized value of the
imported graph while importing via the factory method allows its
inclusion
g.run returns the materialized value of the graph, thus the change in return type.

Related

Eliminating internal collecting when constructing a Source as a response to data

I have a Flow (createDataPointFlow) which is constructed by performing a mapAsync which collects data points (via Sink.seq) which I would otherwise like to stream directly (i.e. without collecting first).
However, it is not obvious to me how I can do this without collecting items, it seems I need some sort of mechanism to publish my items directly to the output portion of the flow I am creating, but I'm new to this and don't know how to do that without getting explicit actors involved, which I would like to avoid.
How can I achieve this without the need to collect things to a Sink first? Remember what I want to achieve is full streaming without the explicit buffering that Sink.seq(...) is doing.
object MyProcess {
def createDataSource(job:Job, dao:DataService):Source[JobDataPoint,NotUsed] = {
// Imagine the below call is equivalent to streaming a parameterized query using Slick
val publisher: Publisher[JobDataPoint] = dao.streamData(Criteria(job.name, job.data))
// Convert to a Source
val src: Source[JobDataPoint, NotUsed] = Source.fromPublisher(publisher)
src
}
def createDataPointFlow(dao:DataService, parallelism:Int=1): Flow[Job,JobDataPoint, NotUsed] =
Flow[Job].mapAsync(parallelism)(job =>
createDataSource(job,dao).toMat(Sink.seq)(Keep.right).run()
).mapConcat(identity)
def apply(src:Source[Job,NotUsed], dao:DataService,parallelism:Int=5) = RunnableGraph.fromGraph(GraphDSL.create(){ implicit builder =>
import GraphDSL.Implicits._
//Source
val jobs:Outlet[Job] = builder.add(src).out
//val bcastJobsSrc: Source[Job, NotUsed] = src.toMat(BroadcastHub.sink(256))(Keep.right).run()
//val bcastOutlet:Outlet[Job] = builder.add(bcastJobsSrc).out
//Flows
val bcastJobs:UniformFanOutShape[Job,Job] = builder.add(Broadcast[Job](4))
val rptMaker = builder.add(MyProcessors.flow(dao,parallelism))
val dpFlow = createDataPointFlow(dao,parallelism)
//Sinks
val jobPrinter:Inlet[Job] = builder.add(Sink.foreach[Job](job=>println(s"[MyGraph] Received job: ${job.name} => $job"))).in
val jobList:Inlet[Job] = builder.add(Sink.fold(List.empty[Job])((list,job:Job)=>job::list)).in
val reporter: Inlet[ReportTable] = builder.add(Sink.foreach[ReportTable](r=>println(s"[Report]: $r"))).in
val dpSink: Inlet[JobDataPoint] = builder.add(Sink.foreach[JobDataPoint](dp=>println(s"[DataPoint]: $dp"))).in
jobs ~> bcastJobs
bcastJobs ~> jobPrinter
bcastJobs ~> jobList
bcastJobs ~> rptMaker ~> reporter
bcastJobs ~> dpFlow ~> dpSink
ClosedShape
})
}
So after re-reading the documentation about the various stages available it turns out that what I needed was a flatMapConcat:
def createDataPointFlow(dao:DataService, parallelism:Int=1): Flow[Job,JobDataPoint, NotUsed] =
Flow[Job].flatMapConcat(createDataSource(_,dao))

Akka Streams why partition is already connected when not using builder.add?

I'm trying out Akka Stream API and I have no idea why this throws java.lang.IllegalArgumentException: [Partition.in] is already connected in line 5
val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val intSource = Source.fromIterator(() => Iterator.continually(Random.nextInt(10).toString))
val validateInput: Flow[String, Message, NotUsed] = Flow[String].map(Message.fromString)
val validationPartitioner = Partition[Message](2, { // #5 error here
case _: Data => 0
case _ => 1
})
val outputStream = Sink.foreach[Message](println(_))
val errorStream = Sink.ignore
intSource ~> validateInput ~> validationPartitioner.in
validationPartitioner.out(0) ~> outputStream
validationPartitioner.out(1) ~> errorStream
ClosedShape
})
but if I change validationPartitioner to be wrapped in builder.add(...) and remove .in from
intSource ~> validateInput ~> validationPartitioner.in
Everything works. If I just remove .in the code doesn't compile. Why usage of builder is being forced and am I missing something or is it a bug?
All of the components of a graph must be added to the builder, but there are variants of the ~> operator that add the most commonly used components, such as Source and Flow, to the builder under the covers (see here and here). However, junction operations that perform a fan-in (such as Merge) or a fan-out (such as Partition) must be explicitly passed to builder.add if you're using the Graph DSL.

Akka Streams: How do I get Materialized Sink output from GraphDSL API?

This is a really simple, newbie question using the GraphDSL API. I read several related SO threads and I don't see the answer:
val actorSystem = ActorSystem("QuickStart")
val executor = actorSystem.dispatcher
val materializer = ActorMaterializer()(actorSystem)
val source: Source[Int, NotUsed] = Source(1 to 5)
val throttledSource = source.throttle(1, 1.second, 1, ThrottleMode.shaping)
val intDoublerFlow = Flow.fromFunction[Int, Int](i => i * 2)
val sink = Sink.foreach(println)
val graphModel = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
// I presume I want to change this shape to something else
// but I can't figure out what it is.
ClosedShape
}
// TODO: This is RunnableGraph[NotUsed], I want RunnableGraph[Future[Done]] that gives the
// materialized Future[Done] from the sink. I presume I need to use a GraphDSL SourceShape
// but I can't get that working.
val graph = RunnableGraph.fromGraph(graphModel)
// This works and gives me the materialized sink output using the simpler API.
// But I want to use the GraphDSL so that I can add branches or junctures.
val graphThatIWantFromDslAPI = throttledSource.toMat(sink)(Keep.right)
The trick is to pass the stage you want the materialized value of (in your case, sink) to the GraphDSL.create. The function you pass as a second parameter changes as well, needing a Shape input parameter (s in the example below) which you can use in your graph.
val graphModel: Graph[ClosedShape, Future[Done]] = GraphDSL.create(sink) { implicit b => s =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> s
// ClosedShape is just fine - it is always the shape of a RunnableGraph
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
More info can be found in the docs.
val graphModel = GraphDSL.create(sink) { implicit b: Builder[Future[Done]] => sink =>
import akka.stream.scaladsl.GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
val graphThatIWantFromDslAPI: RunnableGraph[Future[Done]] = throttledSource.toMat(sink)(Keep.right)
The problem with the GraphDSL API is, that the implicit Builder is heavily overloaded. You need to wrap your sink in create, which turns the Builder[NotUsed] into Builder[Future[Done]] and represents now a function from builder => sink => shape instead of builder => shape.

How to assemble an Akka Streams sink from multiple file writes?

I'm trying to integrate an akka streams based flow in to my Play 2.5 app. The idea is that you can stream in a photo, then have it written to disk as the raw file, a thumbnailed version and a watermarked version.
I managed to get this working using a graph something like this:
val byteAccumulator = Flow[ByteString].fold(new ByteStringBuilder())((builder, b) => {builder ++= b.toArray})
.map(_.result().toArray)
def toByteArray = Flow[ByteString].map(b => b.toArray)
val graph = Flow.fromGraph(GraphDSL.create() {implicit builder =>
import GraphDSL.Implicits._
val streamFan = builder.add(Broadcast[ByteString](3))
val byteArrayFan = builder.add(Broadcast[Array[Byte]](2))
val output = builder.add(Flow[ByteString].map(x => Success(Done)))
val rawFileSink = FileIO.toFile(file)
val thumbnailFileSink = FileIO.toFile(getFile(path, Thumbnail))
val watermarkedFileSink = FileIO.toFile(getFile(path, Watermarked))
streamFan.out(0) ~> rawFileSink
streamFan.out(1) ~> byteAccumulator ~> byteArrayFan.in
streamFan.out(2) ~> output.in
byteArrayFan.out(0) ~> slowThumbnailProcessing ~> thumbnailFileSink
byteArrayFan.out(1) ~> slowWatermarkProcessing ~> watermarkedFileSink
FlowShape(streamFan.in, output.out)
})
graph
}
Then I wire it in to my play controller using an accumulator like this:
val sink = Sink.head[Try[Done]]
val photoStorageParser = BodyParser { req =>
Accumulator(sink).through(graph).map(Right.apply)
}
The problem is that my two processed file sinks aren't completing and I'm getting zero sizes for both processed files, but not the raw one. My theory is that the accumulator is only waiting on one of the outputs of my fan out, so when the input stream completes and my byteAccumulator spits out the complete file, by the time the processing is finished play has got the materialized value from the output.
So, my questions are:
Am I on the right track with this as far as my approach goes?
What is the expected behaviour for running a graph like this?
How can I bring all my sinks together to form one final sink?
Ok, after a little help (Andreas was on the right track), I've arrived at this solution which does the trick:
val rawFileSink = FileIO.toFile(file)
val thumbnailFileSink = FileIO.toFile(getFile(path, Thumbnail))
val watermarkedFileSink = FileIO.toFile(getFile(path, Watermarked))
val graph = Sink.fromGraph(GraphDSL.create(rawFileSink, thumbnailFileSink, watermarkedFileSink)((_, _, _)) {
implicit builder => (rawSink, thumbSink, waterSink) => {
val streamFan = builder.add(Broadcast[ByteString](2))
val byteArrayFan = builder.add(Broadcast[Array[Byte]](2))
streamFan.out(0) ~> rawSink
streamFan.out(1) ~> byteAccumulator ~> byteArrayFan.in
byteArrayFan.out(0) ~> processorFlow(Thumbnail) ~> thumbSink
byteArrayFan.out(1) ~> processorFlow(Watermarked) ~> waterSink
SinkShape(streamFan.in)
}
})
graph.mapMaterializedValue[Future[Try[Done]]](fs => Future.sequence(Seq(fs._1, fs._2, fs._3)).map(f => Success(Done)))
After which it's dead easy to call this from Play:
val photoStorageParser = BodyParser { req =>
Accumulator(theSink).map(Right.apply)
}
def createImage(path: String) = Action(photoStorageParser) { req =>
Created
}

How to get a Subscriber and Publisher from a broadcasted Akka stream?

I'm having problems in getting Publishers and Subscribers out of my flows when using more complicated graphs. My goal is to provide an API of Publishers and Subscribers and run the Akka streaming internally. Here's my first try, which works just fine.
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val flow = subscriberSource.to(someFunctionSink)
//create Reactive Streams Subscriber
val subscriber: Subscriber[Boolean] = flow.run()
//prints true
Source.single(true).to(Sink(subscriber)).run()
But then with a more complicated broadcast graph, I'm unsure as how to get the Subscriber and Publisher objects out? Do I need a partial graph?
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val publisherSink = Sink.publisher[Boolean]
FlowGraph.closed() { implicit builder =>
import FlowGraph.Implicits._
val broadcast = builder.add(Broadcast[Boolean](2))
subscriberSource ~> broadcast.in
broadcast.out(0) ~> someFunctionSink
broadcast.out(1) ~> publisherSink
}.run()
val subscriber: Subscriber[Boolean] = ???
val publisher: Publisher[Boolean] = ???
When you call RunnableGraph.run() the stream is run and the result is the "materialized value" for that run.
In your simple example the materialized value of Source.subscriber[Boolean] is Subscriber[Boolean]. In your complex example you want to combine materialized values of several components of your graph to a materialized value that is a tuple (Subscriber[Boolean], Publisher[Boolean]).
You can do that by passing the components for which you are interested in their materialized values to FlowGraph.closed() and then specify a function to combine the materialized values:
import akka.stream.scaladsl._
import org.reactivestreams._
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val publisherSink = Sink.publisher[Boolean]
val graph =
FlowGraph.closed(subscriberSource, publisherSink)(Keep.both) { implicit builder ⇒
(in, out) ⇒
import FlowGraph.Implicits._
val broadcast = builder.add(Broadcast[Boolean](2))
in ~> broadcast.in
broadcast.out(0) ~> someFunctionSink
broadcast.out(1) ~> out
}
val (subscriber: Subscriber[Boolean], publisher: Publisher[Boolean]) = graph.run()
See the Scaladocs for more information about the overloads of FlowGraph.closed.
(Keep.both is short for a function (a, b) => (a, b))