Akka Stream - Parallel Processing with Partition - scala

I'm looking for a way to implement/use Fan-out which takes 1 input, and broadcast to N outputs parallel, the difference is that i want to partition them.
Example: 1 input can emit to 4 different outputs, and other input can emit to 2 others outputs, depends on some function f
source ~> partitionWithBroadcast // Outputs to some subset of [0,3] outputs
partitionWithBroadcast(0) ~> ...
partitionWithBroadcast(1) ~> ...
partitionWithBroadcast(2) ~> ...
partitionWithBroadcast(3) ~> ...
I was searching in the Akka documentation but couldn't found any flow which can be suitable
any ideas?

What comes to mind is a FanOutShape with filters attached to each output. NOTE: I am not using the standard Partition operator because it emits to just 1 output. The question asks to emit to any of the connected outputs. E.g.:
def createPartial[E](partitioner: E => Set[Int]) = {
GraphDSL.create[FanOutShape4[E,E,E,E,E]]() { implicit builder =>
import GraphDSL.Implicits._
val flow = builder.add(Flow.fromFunction((e: E) => (e, partitioner(e))))
val broadcast = builder.add(Broadcast[(E, Set[Int])](4))
val flow0 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(0)).map(_._1))
val flow1 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(1)).map(_._1))
val flow2 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(2)).map(_._1))
val flow3 = builder.add(Flow[(E, Set[Int])].filter(_._2.contains(3)).map(_._1))
flow.out ~> broadcast.in
broadcast.out(0) ~> flow0.in
broadcast.out(1) ~> flow1.in
broadcast.out(2) ~> flow2.in
broadcast.out(3) ~> flow3.in
new FanOutShape4[E,E,E,E,E](flow.in, flow0.out, flow1.out, flow2.out, flow3.out)
}
}
The partitioner is a function that maps an element from upstream to a tuple having that element and a set of integers that will activate the corresponding output. The graph calculates the desired partitions, then broadcasts the tuple. A flow attached to each of the outputs of the Broadcast selects elements that the partitioner assigned to that output.
Then use it e.g. as:
implicit val system: ActorSystem = ActorSystem()
implicit val ec = system.dispatcher
def partitioner(s: String) = (0 to 3).filter(s(_) == '*').toSet
val src = Source(immutable.Seq("*__*", "**__", "__**", "_*__"))
val sink0 = Sink.seq[String]
val sink1 = Sink.seq[String]
val sink2 = Sink.seq[String]
val sink3 = Sink.seq[String]
def toFutureTuple[X](f0: Future[X], f1: Future[X], f2: Future[X], f3: Future[X]) = f0.zip(f1).zip(f2).map(t => (t._1._1,t._1._2,t._2)).zip(f3).map(t => (t._1._1,t._1._2,t._1._3,t._2))
val g = RunnableGraph.fromGraph(GraphDSL.create(src, sink0, sink1, sink2, sink3)((_,f0,f1,f2,f3) => toFutureTuple(f0,f1,f2,f3)) { implicit builder =>
(in, o0, o1, o2, o3) => {
import GraphDSL.Implicits._
val part = builder.add(createPartial(partitioner))
in ~> part.in
part.out0 ~> o0
part.out1 ~> o1
part.out2 ~> o2
part.out3 ~> o3
ClosedShape
}
})
val result = Await.result(g.run(), 10.seconds)
println("0: " + result._1.mkString(" "))
println("1: " + result._2.mkString(" "))
println("2: " + result._3.mkString(" "))
println("3: " + result._4.mkString(" "))
// Prints:
//
// 0: *__* **__
// 1: **__ _*__
// 2: __**
// 3: *__* __**

First, implement your function to create the Partition:
def partitionFunction4[A](func: A => Int)(implicit builder: GraphDSL.Builder[NotUsed]) = {
// partition with 4 output ports
builder.add(Partition[A](4, inputElement => func(inputElement)))
}
then, create another function to create a Sink with a log function that is going to be used to print in the console the element:
def stream[A](log: A => Unit) = Flow.fromFunction[A, A](el => {
log(el)
el
} ).to(Sink.ignore)
Connect all the elements in the *graph function:
def graph[A](src: Source[A, NotUsed])
(func4: A => Int, log: Int => A => Unit) = {
RunnableGraph
.fromGraph(GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val partition4 = partitionFunction4(func4)
/** Four sinks **/
val flowSet0 = (0 to 4).map(in => log(in))
src ~> partition4.in
partition4.out(0) ~> stream(flowSet0(0))
partition4.out(1) ~> stream(flowSet0(1))
partition4.out(2) ~> stream(flowSet0(2))
partition4.out(3) ~> stream(flowSet0(3))
ClosedShape
})
.run()
}
Create a Source that emits five Int elements. The function to create the partition is "element % 4". Depending on the result of this function the element will be redirected to the specific source:
val source1: Source[Int, NotUsed] = Source(0 to 4)
graph[Int](source1)(f1 => f1 % 4,
in => {
el =>
println(s"Stream ${in} element ${el}")
})
Obtaining as result:
Stream 0 element 0
Stream 1 element 1
Stream 2 element 2
Stream 3 element 3
Stream 0 element 4

Related

Akka - convert Flow into Collection or Publisher

I'm trying to split an Akka Source into two separate ones.
val requestFlow = Flow[BodyPartEntity].to(Sink.seq) // convert to Seq[BodyPartEntity]
val dataFlow = Flow[BodyPartEntity].to(Sink.asPublisher(fanout = false)) // convert to Publisher[BodyPartEntity]
implicit class EitherSourceExtension[L, R, Mat](source: Source[FormData.BodyPart, Mat]) {
def partition(left: Sink[BodyPartEntity, NotUsed], right: Sink[BodyPartEntity, NotUsed]): Graph[ClosedShape, NotUsed] = {
GraphDSL.create() { implicit builder =>
import akka.stream.scaladsl.GraphDSL.Implicits._
val partition = builder.add(Partition[FormData.BodyPart](2, element => if (element.getName == "request") 0 else 1))
source ~> partition.in
partition.out(0).map(_.getEntity) ~> left
partition.out(1).map(_.getEntity) ~> right
ClosedShape
}
}
}
How to convert requestFlow into Seq[BodyPartEntity] and dataFlow into Publisher[BodyPartEntity]
You could use a BroadcastHub for this. From doc:
A BroadcastHub can be used to consume elements from a common producer by a dynamic set of consumers.
Simplified code:
val runnableGraph: RunnableGraph[Source[Int, NotUsed]] =
Source(1 to 5).toMat(
BroadcastHub.sink(bufferSize = 4))(Keep.right)
val fromProducer: Source[Int, NotUsed] = runnableGraph.run()
// Process the messages from the producer in two independent consumers
fromProducer.runForeach(msg => println("consumer1: " + msg))
fromProducer.runForeach(msg => println("consumer2: " + msg))

Is it possible to create a Flow with Akka-stream that can switch between 2 different inner Shapes?

I would like to have a complex Flow that I can switch inside it between 2 different Shapes depending on the data that is flowing into the graph. When we return a ClosedShape the graph is static but when we return FlowShape I was wondering if it is possible to create some kind of dynamic flow inside it. I was looking at this question and it seems that they use a Partition which I don't know how to apply or if it actually solves my problem.
I started with this example and I am stuck on the comment in the code.
import akka.actor.ActorSystem
import akka.stream.FlowShape
import akka.stream.scaladsl.{Flow, GraphDSL, Sink, Source}
import scala.concurrent.duration._
object StreamOpenGraphsWithMultipleFlows extends App {
run()
def run() = {
implicit val system = ActorSystem("StreamOpenGraphsWithMultipleFlows")
val fastSource = Source(1 to 1000).throttle(50, 1 second)
val slowSource = Source(1 to 1000).throttle(5, 1 second)
val INC = 5
val MULTI = 10
val DIVIDE = 2
val incrementer = Flow[Int].map { x =>
val result = x + INC
print(s" | incrementing $x + $INC -> $result")
result
}
val multiplier = Flow[Int].map { x =>
val result = x * MULTI
print(s" | multiplying $x * $MULTI -> $result")
result
}
val divider = Flow[Int].map { x =>
val result = x / DIVIDE
print(s" | dividing $x / $DIVIDE -> $result")
result
}
def isMultipleOf(value: Int, multiple: Int): Boolean = (value % multiple) == 0
// Step 1 - setting up the fundamental for a stream graph
val complexFlowIncrementer = Flow.fromGraph(
GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
// Step 2 - add necessary components of this graph
val incrementerShape = builder.add(incrementer)
val multiplierShape = builder.add(multiplier)
// println(s"builder.materializedValue: ${builder.materializedValue}")
// Step 3 - tying up the components
incrementerShape ~> multiplierShape
// BUT I WOULD LIKE TO DO SOMETHING AS BELOW
// if (isMultipleOf(value???, 10)) incrementerShape ~> divider
// else incrementerShape ~> multiplierShape
// Step 4 - return the shape
FlowShape(incrementerShape.in, multiplierShape.out)
}
)
// run the graph and materialize it
val graph = slowSource
.via(complexFlowIncrementer)
.to(Sink.foreach(x => println(s" | result: $x")))
graph.run()
}
}
This blog post shows code samples how to achieve that, so in your case you'd need sth along those lines:
val complexFlowIncrementer = Flow.fromGraph(
GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
// Step 2 - add necessary components of this graph
val incrementerShape = builder.add(incrementer)
val multiplierShape = builder.add(multiplier)
val dividerShape = builder.add(divider)
//add partition and merge
val partition = builder.add(Partition[Int](2, if(isMultipleOf(_, 10)) 0 else 1)
val merge = builder.add(Merge[Int](2))
// println(s"builder.materializedValue: ${builder.materializedValue}")
// Step 3 - tying up the components
incrementerShape ~> partition
partition.out(0) ~> dividerShape ~> merge.in(0)
partition.out(1) ~> multiplierShape ~> merge.in(1)
// Step 4 - return the shape
FlowShape(incrementerShape.in, merge.out)
}
)

Difficulty with Scatter and Gather pattern with Akka Streams

I have an object like this
case class Foo(id: Int, id1: Option[Int], id2: Option[Int])
Here id1 and id2 are obtained in two separate lookups. So first I scatter using the broadcast and then I do a gather using a merge and a groupBy.
The code I have written is
val source = Source(List(Foo(1), Foo(2), Foo(3), Foo(4)))
val flow1 = Flow[Foo].map(foo => foo.copy(id1 = Some(Random.nextInt())))
val flow2 = Flow[Foo].map(foo => foo.copy(id2 = Some(Random.nextInt())))
val flow3 = Flow[Foo].groupBy(100, foo => foo.id)
val flow4 = Flow[Foo].reduce{case (foo, fooLookup) =>
if (fooLookup.id1.isDefined) foo.copy(id1 = fooLookup.id1)
if (fooLookup.id2.isDefined) foo.copy(id2 = fooLookup.id2)
else foo
}
val sink = Sink.foreach[Foo](println)
val graph = RunnableGraph.fromGraph(GraphDSL.create(sink) { implicit builder =>
s =>
import GraphDSL.Implicits._
val b = builder.add(Broadcast[Foo](2))
val m = builder.add(Merge[Foo](2))
source ~> b
b ~> flow1 ~> m
b ~> flow2 ~> m
m ~> flow3.mergeSubStreams ~> flow4 ~> s.in
ClosedShape
})
This doesn't compile because the compiler doesn't like the flow3.mergeSubStreams.
My final goal is that the lookup for id1 and id2 happens on two separate branches and I should be able to merge and print the final object which has id, id1 and id2.
Edit:: Another question I have is that since I had split the stream into 2 the reduce function should move forward once it has processed 2 Foos. Right now it seems that the reduce function will wait for the entire stream to end because it doesn't know how many foos it will receive. So is there a way to tell the reducer that once it has received certain number of records it should pass it on to the sink.

Akka Streams scala DSL and Op-Rabbit

I have started using Akka Streams and Op-Rabbit and am a bit confused.
I need to split the stream based on a predicate and then combine them much like I have done when creating graphs and using the Partition and Merge.
I have been able to do things like this using the GraphDSL.Builder, but can't seem to get it to work with AckedSource/Flow/Sink
the graph would look like:
| --> flow1 --> |
source--> partition --> | | --> flow3 --> sink
| --> flow2 --> |
I'm not sure if splitWhen is what I should use because I always need exactly 2 flows.
This is a sample that does not do the partitioning and does not use the GraphDSL.Builder:
def splitExample(source: AckedSource[String, SubscriptionRef],
queueName: String)
(implicit actorSystem: ActorSystem): RunnableGraph[SubscriptionRef] = {
val toStringFlow: Flow[AckTup[Message], AckTup[String], NotUsed] = Flow[AckTup[Message]]
.map[AckTup[String]](tup => {
val (p,m) = tup
(p, new String(m.data))
})
val printFlow1: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]]
.map[AckTup[String]](tup => {
val (p, s) = tup
println(s"flow1 processing $s")
tup
})
val printFlow2: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]]
.map[AckTup[String]](tup => {
val (p, s) = tup
println(s"flow2 processing $s")
tup
})
source
.map(Message.queue(_, queueName))
.via(AckedFlow(toStringFlow))
// partition if string.length < 10
.via(AckedFlow(printFlow1))
.via(AckedFlow(printFlow2))
.to(AckedSink.ack)
}
This is the code that I can't seem to get working:
import GraphDSL.Implicits._
def buildModelAcked(source: AckedSource[String, SubscriptionRef] , queueName: String)(implicit actorSystem: ActorSystem): Graph[ClosedShape, Future[Done]] = {
import GraphDSL.Implicits._
GraphDSL.create(Sink.ignore) { implicit builder: GraphDSL.Builder[Future[Done]] => s =>
import GraphDSL.Implicits._
source.map(Message.queue(_, queueName)) ~> AckedFlow(toStringFlow) ~> AckedSink.ack
// source.map(Message.queue(_, queueName)).via(AckedFlow(toStringFlow)).to(AckedSink.ack)
ClosedShape
}}
The compiler can't resolve the ~> operator
So my questions are:
Is there an example project that uses the scala dsl to build graphs of Acked/Source/Flow/Sink?
Is there an example project that partitions and merges that is similar to what I am trying to do here?
Keep in mind the following definitions when dealing the acked-stream.
AckedSource[Out, Mat] is a wrapper for Source[AckTup[Out], Mat]]
AckedFlow[In, Out, Mat] is a wrapper for Flow[AckTup[In], AckTup[Out], Mat]
AckedSink[In, Mat] is a wrapper for Sink[AckTup[In], Mat]
AckTup[T] is an alias for (Promise[Unit], T)
the classic flow combinators will operate on the T part of the AckTup
the .acked combinator will complete the Promise[Unit] of an AckedFlow
The GraphDSL edge operator (~>) will work against a bunch of Akka predefined shapes (see the code for GraphDSL.Implicits), but it won't work against custom shapes defined by the acked-stream lib.
You got 2 ways out:
you define your own ~> implicit operator, along the lines of the ones in GraphDSL.Implicits
you unwrap the acked stages to obtain standard stages. You are able to access the wrapped stage using .wrappedRepr - available on AckedSource, AckedFlow and AckedSink.
Based on Stefano Bonetti's excellent direction, here is a possible solution:
graph:
|--> short --|
rabbitMq --> before --| |--> after
|--> long --|
solution:
val before: Flow[AckTup[Message], AckTup[String], NotUsed] = Flow[AckTup[Message]].map[AckTup[String]](tup => {
val (p,m) = tup
(p, new String(m.data))
})
val short: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]].map[AckTup[String]](tup => {
val (p, s) = tup
println(s"short: $s")
tup
})
val long: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]].map[AckTup[String]](tup => {
val (p, s) = tup
println(s"long: $s")
tup
})
val after: Flow[AckTup[String], AckTup[String], NotUsed] = Flow[AckTup[String]].map[AckTup[String]](tup => {
val (p, s) = tup
println(s"all $s")
tup
})
def buildSplitGraph(source: AckedSource[String, SubscriptionRef]
, queueName: String
, splitLength: Int)(implicit actorSystem: ActorSystem): Graph[ClosedShape, Future[Done]] = {
GraphDSL.create(Sink.ignore) { implicit builder: GraphDSL.Builder[Future[Done]] => s =>
val toShort = 0
val toLong = 1
// junctions
val split = builder.add(Partition[AckTup[String]](2, (tup: AckTup[String]) => {
val (p, s) = tup
if (s.length < splitLength) toShort else toLong
}
))
val merge = builder.add(Merge[AckTup[String]](2))
//graph
val beforeSplit = source.map(Message.queue(_, queueName)).wrappedRepr ~> AckedFlow(before).wrappedRepr
beforeSplit ~> split
// must do short, then long since the split goes in that order
split ~> AckedFlow(short).wrappedRepr ~> merge
split ~> AckedFlow(long).wrappedRepr ~> merge
// after the last AckedFlow, be sure to '.acked' so that the message will be removed from the queue
merge ~> AckedFlow(after).acked ~> s
ClosedShape
}}
As Stefano Bonetti said, the key was to use the .wrappedRepr associated with the AckedFlow and then to use the .acked combinator as the last step.

Akka Streams - Combine different Sources

i have an object that builds different Flows, each flow has filters, that may can discard values, so the final result may contain a subset of the original source.
The code:
object RawFlowGeneratorByVehicle {
val deviceEventFilter = (de : DeviceEvent) => de.isValidPosition : Boolean
def buildSpeedFlow(vehicles : List[Vehicle]) : VEHICLERAWFLOW = {
Flow[DeviceEvent].filter(deviceEventFilter)
.groupBy(vehicles.length,de => de.getModemId)
.reduce((a, b) => if(a.getGenerationDate >= b.getGenerationDate) a else b)
.mergeSubstreams
.map(de => VehicleFlowResult(de.getModemId,"Speed",de.getSpeed))
}
def buildCountFlow(vehicles: List[Vehicle], maxSpeed : Double) : VEHICLERAWFLOW = {
Flow[DeviceEvent].filter(deviceEventFilter)
.groupBy(vehicles.length,de => de.getModemId)
.filter(de => de.getSpeed > maxSpeed)
.map(_ -> 1)
.reduce((l, r) => (l._1, l._2 + r._2))
.mergeSubstreams
.map(a => VehicleFlowResult(a._1.getModemId, "SpeedCount", a._2))
}
//...Other flows
}
After build the flows, merge them in a graph and the final result is a csv file , this is the object with the graph
object RunnableFlows {
def rawGraph(in: Source[DeviceEvent, NotUsed], flows: List[VEHICLERAWFLOW]): Source[VehicleFlowResult, NotUsed] = {
val g = Source.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val bcast = builder.add(Broadcast[DeviceEvent](flows.length))
val merge = builder.add(Merge[VehicleFlowResult](flows.length))
in ~> bcast ~> flows.head ~> merge
for (curFlow <- flows.tail) {
bcast ~> curFlow ~> merge
}
SourceShape(merge.out)
})
g
}
}
the flows may have different size , so i dont know how to merge/concat/zip?, to generate a csv of the same size of rows like the vehicles list(this list dont have duplicate values),setting default values when an specific vehicle not pass the filters of the flows
The csv must be something like this
imei;name;event;value
aaa;vehicle1;Event1;100
aaa;vehicle1;Event2;100
bbb;vehicle2;DefaultEvent;defaultValue
ccc;vehicle3;Event5;89
Thanks!!