Akka - convert Flow into Collection or Publisher - scala

I'm trying to split an Akka Source into two separate ones.
val requestFlow = Flow[BodyPartEntity].to(Sink.seq) // convert to Seq[BodyPartEntity]
val dataFlow = Flow[BodyPartEntity].to(Sink.asPublisher(fanout = false)) // convert to Publisher[BodyPartEntity]
implicit class EitherSourceExtension[L, R, Mat](source: Source[FormData.BodyPart, Mat]) {
def partition(left: Sink[BodyPartEntity, NotUsed], right: Sink[BodyPartEntity, NotUsed]): Graph[ClosedShape, NotUsed] = {
GraphDSL.create() { implicit builder =>
import akka.stream.scaladsl.GraphDSL.Implicits._
val partition = builder.add(Partition[FormData.BodyPart](2, element => if (element.getName == "request") 0 else 1))
source ~> partition.in
partition.out(0).map(_.getEntity) ~> left
partition.out(1).map(_.getEntity) ~> right
ClosedShape
}
}
}
How to convert requestFlow into Seq[BodyPartEntity] and dataFlow into Publisher[BodyPartEntity]

You could use a BroadcastHub for this. From doc:
A BroadcastHub can be used to consume elements from a common producer by a dynamic set of consumers.
Simplified code:
val runnableGraph: RunnableGraph[Source[Int, NotUsed]] =
Source(1 to 5).toMat(
BroadcastHub.sink(bufferSize = 4))(Keep.right)
val fromProducer: Source[Int, NotUsed] = runnableGraph.run()
// Process the messages from the producer in two independent consumers
fromProducer.runForeach(msg => println("consumer1: " + msg))
fromProducer.runForeach(msg => println("consumer2: " + msg))

Related

Akka Stream - Parallel Processing with Partition

I'm looking for a way to implement/use Fan-out which takes 1 input, and broadcast to N outputs parallel, the difference is that i want to partition them.
Example: 1 input can emit to 4 different outputs, and other input can emit to 2 others outputs, depends on some function f
source ~> partitionWithBroadcast // Outputs to some subset of [0,3] outputs
partitionWithBroadcast(0) ~> ...
partitionWithBroadcast(1) ~> ...
partitionWithBroadcast(2) ~> ...
partitionWithBroadcast(3) ~> ...
I was searching in the Akka documentation but couldn't found any flow which can be suitable
any ideas?
What comes to mind is a FanOutShape with filters attached to each output. NOTE: I am not using the standard Partition operator because it emits to just 1 output. The question asks to emit to any of the connected outputs. E.g.:
def createPartial[E](partitioner: E => Set[Int]) = {
GraphDSL.create[FanOutShape4[E,E,E,E,E]]() { implicit builder =>
import GraphDSL.Implicits._
val flow = builder.add(Flow.fromFunction((e: E) => (e, partitioner(e))))
val broadcast = builder.add(Broadcast[(E, Set[Int])](4))
val flow0 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(0)).map(_._1))
val flow1 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(1)).map(_._1))
val flow2 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(2)).map(_._1))
val flow3 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(3)).map(_._1))
flow.out ~> broadcast.in
broadcast.out(0) ~> flow0.in
broadcast.out(1) ~> flow1.in
broadcast.out(2) ~> flow2.in
broadcast.out(3) ~> flow3.in
new FanOutShape4[E,E,E,E,E](flow.in, flow0.out, flow1.out, flow2.out, flow3.out)
}
}
The partitioner is a function that maps an element from upstream to a tuple having that element and a set of integers that will activate the corresponding output. The graph calculates the desired partitions, then broadcasts the tuple. A flow attached to each of the outputs of the Broadcast selects elements that the partitioner assigned to that output.
Then use it e.g. as:
implicit val system: ActorSystem = ActorSystem()
implicit val ec = system.dispatcher
def partitioner(s: String) = (0 to 3).filter(s(_) == '*').toSet
val src = Source(immutable.Seq("*__*", "**__", "__**", "_*__"))
val sink0 = Sink.seq[String]
val sink1 = Sink.seq[String]
val sink2 = Sink.seq[String]
val sink3 = Sink.seq[String]
def toFutureTuple[X](f0: Future[X], f1: Future[X], f2: Future[X], f3: Future[X]) = f0.zip(f1).zip(f2).map(t => (t._1._1,t._1._2,t._2)).zip(f3).map(t => (t._1._1,t._1._2,t._1._3,t._2))
val g = RunnableGraph.fromGraph(GraphDSL.create(src, sink0, sink1, sink2, sink3)((_,f0,f1,f2,f3) => toFutureTuple(f0,f1,f2,f3)) { implicit builder =>
(in, o0, o1, o2, o3) => {
import GraphDSL.Implicits._
val part = builder.add(createPartial(partitioner))
in ~> part.in
part.out0 ~> o0
part.out1 ~> o1
part.out2 ~> o2
part.out3 ~> o3
ClosedShape
}
})
val result = Await.result(g.run(), 10.seconds)
println("0: " + result._1.mkString(" "))
println("1: " + result._2.mkString(" "))
println("2: " + result._3.mkString(" "))
println("3: " + result._4.mkString(" "))
// Prints:
//
// 0: *__* **__
// 1: **__ _*__
// 2: __**
// 3: *__* __**
First, implement your function to create the Partition:
def partitionFunction4[A](func: A => Int)(implicit builder: GraphDSL.Builder[NotUsed]) = {
// partition with 4 output ports
builder.add(Partition[A](4, inputElement => func(inputElement)))
}
then, create another function to create a Sink with a log function that is going to be used to print in the console the element:
def stream[A](log: A => Unit) = Flow.fromFunction[A, A](el => {
log(el)
el
} ).to(Sink.ignore)
Connect all the elements in the *graph function:
def graph[A](src: Source[A, NotUsed])
(func4: A => Int, log: Int => A => Unit) = {
RunnableGraph
.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val partition4 = partitionFunction4(func4)
/** Four sinks **/
val flowSet0 = (0 to 4).map(in => log(in))
src ~> partition4.in
partition4.out(0) ~> stream(flowSet0(0))
partition4.out(1) ~> stream(flowSet0(1))
partition4.out(2) ~> stream(flowSet0(2))
partition4.out(3) ~> stream(flowSet0(3))
ClosedShape
})
.run()
}
Create a Source that emits five Int elements. The function to create the partition is "element % 4". Depending on the result of this function the element will be redirected to the specific source:
val source1: Source[Int, NotUsed] = Source(0 to 4)
graph[Int](source1)(f1 => f1 % 4,
in => {
el =>
println(s"Stream ${in} element ${el}")
})
Obtaining as result:
Stream 0 element 0
Stream 1 element 1
Stream 2 element 2
Stream 3 element 3
Stream 0 element 4

Akka Streams scala DSL and Op-Rabbit

I have started using Akka Streams and Op-Rabbit and am a bit confused.
I need to split the stream based on a predicate and then combine them much like I have done when creating graphs and using the Partition and Merge.
I have been able to do things like this using the GraphDSL.Builder, but can't seem to get it to work with AckedSource/Flow/Sink
the graph would look like:
| --> flow1 --> |
source--> partition --> | | --> flow3 --> sink
| --> flow2 --> |
I'm not sure if splitWhen is what I should use because I always need exactly 2 flows.
This is a sample that does not do the partitioning and does not use the GraphDSL.Builder:
def splitExample(source: AckedSource[String, SubscriptionRef],
queueName: String)
(implicit actorSystem: ActorSystem): RunnableGraph[SubscriptionRef] = {
val toStringFlow: Flow[AckTup[Message], AckTup[String], NotUsed] = Flow[AckTup[Message]]
.map[AckTup[String]](tup => {
val (p,m) = tup
(p, new String(m.data))
})
val printFlow1: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]]
.map[AckTup[String]](tup => {
val (p, s) = tup
println(s"flow1 processing $s")
tup
})
val printFlow2: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]]
.map[AckTup[String]](tup => {
val (p, s) = tup
println(s"flow2 processing $s")
tup
})
source
.map(Message.queue(_, queueName))
.via(AckedFlow(toStringFlow))
// partition if string.length < 10
.via(AckedFlow(printFlow1))
.via(AckedFlow(printFlow2))
.to(AckedSink.ack)
}
This is the code that I can't seem to get working:
import GraphDSL.Implicits._
def buildModelAcked(source: AckedSource[String, SubscriptionRef] , queueName: String)(implicit actorSystem: ActorSystem): Graph[ClosedShape, Future[Done]] = {
import GraphDSL.Implicits._
GraphDSL.create(Sink.ignore) { implicit builder: GraphDSL.Builder[Future[Done]] => s =>
import GraphDSL.Implicits._
source.map(Message.queue(_, queueName)) ~> AckedFlow(toStringFlow) ~> AckedSink.ack
// source.map(Message.queue(_, queueName)).via(AckedFlow(toStringFlow)).to(AckedSink.ack)
ClosedShape
}}
The compiler can't resolve the ~> operator
So my questions are:
Is there an example project that uses the scala dsl to build graphs of Acked/Source/Flow/Sink?
Is there an example project that partitions and merges that is similar to what I am trying to do here?
Keep in mind the following definitions when dealing the acked-stream.
AckedSource[Out, Mat] is a wrapper for Source[AckTup[Out], Mat]]
AckedFlow[In, Out, Mat] is a wrapper for Flow[AckTup[In], AckTup[Out], Mat]
AckedSink[In, Mat] is a wrapper for Sink[AckTup[In], Mat]
AckTup[T] is an alias for (Promise[Unit], T)
the classic flow combinators will operate on the T part of the AckTup
the .acked combinator will complete the Promise[Unit] of an AckedFlow
The GraphDSL edge operator (~>) will work against a bunch of Akka predefined shapes (see the code for GraphDSL.Implicits), but it won't work against custom shapes defined by the acked-stream lib.
You got 2 ways out:
you define your own ~> implicit operator, along the lines of the ones in GraphDSL.Implicits
you unwrap the acked stages to obtain standard stages. You are able to access the wrapped stage using .wrappedRepr - available on AckedSource, AckedFlow and AckedSink.
Based on Stefano Bonetti's excellent direction, here is a possible solution:
graph:
|--> short --|
rabbitMq --> before --| |--> after
|--> long --|
solution:
val before: Flow[AckTup[Message], AckTup[String], NotUsed] = Flow[AckTup[Message]].map[AckTup[String]](tup => {
val (p,m) = tup
(p, new String(m.data))
})
val short: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]].map[AckTup[String]](tup => {
val (p, s) = tup
println(s"short: $s")
tup
})
val long: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]].map[AckTup[String]](tup => {
val (p, s) = tup
println(s"long: $s")
tup
})
val after: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]].map[AckTup[String]](tup => {
val (p, s) = tup
println(s"all $s")
tup
})
def buildSplitGraph(source: AckedSource[String, SubscriptionRef]
, queueName: String
, splitLength: Int)(implicit actorSystem: ActorSystem): Graph[ClosedShape, Future[Done]] = {
GraphDSL.create(Sink.ignore) { implicit builder: GraphDSL.Builder[Future[Done]] => s =>
val toShort = 0
val toLong = 1
// junctions
val split = builder.add(Partition[AckTup[String]](2, (tup: AckTup[String]) => {
val (p, s) = tup
if (s.length < splitLength) toShort else toLong
}
))
val merge = builder.add(Merge[AckTup[String]](2))
//graph
val beforeSplit = source.map(Message.queue(_, queueName)).wrappedRepr ~> AckedFlow(before).wrappedRepr
beforeSplit ~> split
// must do short, then long since the split goes in that order
split ~> AckedFlow(short).wrappedRepr ~> merge
split ~> AckedFlow(long).wrappedRepr ~> merge
// after the last AckedFlow, be sure to '.acked' so that the message will be removed from the queue
merge ~> AckedFlow(after).acked ~> s
ClosedShape
}}
As Stefano Bonetti said, the key was to use the .wrappedRepr associated with the AckedFlow and then to use the .acked combinator as the last step.

Akka Streams: How do I get Materialized Sink output from GraphDSL API?

This is a really simple, newbie question using the GraphDSL API. I read several related SO threads and I don't see the answer:
val actorSystem = ActorSystem("QuickStart")
val executor = actorSystem.dispatcher
val materializer = ActorMaterializer()(actorSystem)
val source: Source[Int, NotUsed] = Source(1 to 5)
val throttledSource = source.throttle(1, 1.second, 1, ThrottleMode.shaping)
val intDoublerFlow = Flow.fromFunction[Int, Int](i => i * 2)
val sink = Sink.foreach(println)
val graphModel = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
// I presume I want to change this shape to something else
// but I can't figure out what it is.
ClosedShape
}
// TODO: This is RunnableGraph[NotUsed], I want RunnableGraph[Future[Done]] that gives the
// materialized Future[Done] from the sink. I presume I need to use a GraphDSL SourceShape
// but I can't get that working.
val graph = RunnableGraph.fromGraph(graphModel)
// This works and gives me the materialized sink output using the simpler API.
// But I want to use the GraphDSL so that I can add branches or junctures.
val graphThatIWantFromDslAPI = throttledSource.toMat(sink)(Keep.right)
The trick is to pass the stage you want the materialized value of (in your case, sink) to the GraphDSL.create. The function you pass as a second parameter changes as well, needing a Shape input parameter (s in the example below) which you can use in your graph.
val graphModel: Graph[ClosedShape, Future[Done]] = GraphDSL.create(sink) { implicit b => s =>
import GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> s
// ClosedShape is just fine - it is always the shape of a RunnableGraph
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
More info can be found in the docs.
val graphModel = GraphDSL.create(sink) { implicit b: Builder[Future[Done]] => sink =>
import akka.stream.scaladsl.GraphDSL.Implicits._
throttledSource ~> intDoublerFlow ~> sink
ClosedShape
}
val graph: RunnableGraph[Future[Done]] = RunnableGraph.fromGraph(graphModel)
val graphThatIWantFromDslAPI: RunnableGraph[Future[Done]] = throttledSource.toMat(sink)(Keep.right)
The problem with the GraphDSL API is, that the implicit Builder is heavily overloaded. You need to wrap your sink in create, which turns the Builder[NotUsed] into Builder[Future[Done]] and represents now a function from builder => sink => shape instead of builder => shape.

How do you deal with futures in Akka Flow?

I have built an akka graph that defines a flow. My objective is to reformat my future response and save it to a file. The flow can be outlined bellow:
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val balancer = builder.add(Balance[(HttpRequest, String)](6, waitForAllDownstreams = false))
val merger = builder.add(Merge[Future[Map[String, String]]](6))
val fileSink = FileIO.toPath(outputPath, options)
val ignoreSink = Sink.ignore
val in = Source(seeds)
in ~> balancer.in
for (i <- Range(0,6)) {
balancer.out(i) ~>
wikiFlow.async ~>
// This maps to a Future[Map[String, String]]
Flow[(Try[HttpResponse], String)].map(parseHtml) ~>
merger
}
merger.out ~>
// When we merge we need to map our Map to a file
Flow[Future[Map[String, String]]].map((d) => {
// What is the proper way of serializing future map
// so I can work with it like a normal stream into fileSink?
// I could manually do ->
// d.foreach(someWriteToFileProcess(_))
// with ignoreSink, but this defeats the nice
// akka flow
}) ~>
fileSink
ClosedShape
})
I can hack this workflow to write my future map to a file via foreach, but I'm afraid this could somehow lead to concurrency issues with FileIO and it just doesn't feel right. What is the proper way to handle futures with our akka flow?
The easiest way to create a Flow which involves an asynchronous computation is by using mapAsync.
So... lets say you want to create a Flow which consumes Int and produces String using an asynchronous computation mapper: Int => Future[String] with a parallelism of 5.
val mapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](5)(mapper)
Now, you can use this flow in your graph however you want.
An example usage will be,
val graph = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val intSource = Source(1 to 10)
val printSink = Sink.foreach[String](s => println(s))
val yourMapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](2)(yourMapper)
intSource ~> yourFlow ~> printSink
ClosedShape
}

Idiomatic way to turn an Akka Source into a Spark InputDStream

I'm essentially trying to do the opposite of what is being asked in this question; that is to say, use a Source[A] to push elements into a InputDStream[A].
So far, I've managed to clobber together an implementation that uses a Feeder actor and a Receiver actor similar to the ActorWordCount example, but this seems a bit complex so I'm curious if there is a simpler way.
EDIT: Self-accepting after 5 days since there have been no good answers.
I've extracted the Actor-based implementation into a lib, Sparkka-streams, and it's been working for me thus far. When a solution to this question that is better shows up, I'll either update or deprecate the lib.
Its usage is as follows:
// InputDStream can then be used to build elements of the graph that require integration with Spark
val (inputDStream, feedDInput) = Streaming.connection[Int]()
val source = Source.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val source = Source(1 to 10)
val bCast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val add1 = Flow[Int].map(_ + 1)
val times3 = Flow[Int].map(_ * 3)
source ~> bCast ~> add1 ~> merge
bCast ~> times3 ~> feedDInput ~> merge
SourceShape(merge.out)
})
val reducedFlow = source.runWith(Sink.fold(0)(_ + _))
whenReady(reducedFlow)(_ shouldBe 230)
val sharedVar = ssc.sparkContext.accumulator(0)
inputDStream.foreachRDD { rdd =>
rdd.foreach { i =>
sharedVar += i
}
}
ssc.start()
eventually(sharedVar.value shouldBe 165)
Ref: http://spark.apache.org/docs/latest/streaming-custom-receivers.html
You can do it like:
class StreamStopped extends RuntimeException("Stream stopped")
// Serializable factory class
case class SourceFactory(start: Int, end: Int) {
def source = Source(start to end).map(_.toString)
}
class CustomReceiver(sourceFactory: SourceFactory)
extends Receiver[String](StorageLevel.MEMORY_AND_DISK_2) with Logging {
implicit val materializer = ....
def onStart() {
sourceFactory.source.runForEach { e =>
if (isStopped) {
// Stop the source
throw new StreamStopped
} else {
store(e)
}
} onFailure {
case _: StreamStopped => // ignore
case ex: Throwable => reportError("Source exception", ex)
}
}
def onStop() {}
}
val customReceiverStream = ssc.receiverStream(new CustomReceiver(SourceFactory(1,100))