How can I merge an arbitrary number of sources in Akka stream? - scala

I have n sources that I'd like to merge by priority in Akka streams. I'm basing my implementation on the GraphMergePrioritiziedSpec, in which three prioritized sources are merged. I attempted to abstract away the number of Sources with the following:
import akka.NotUsed
import akka.stream.{ClosedShape, Graph, Materializer}
import akka.stream.scaladsl.{GraphDSL, MergePrioritized, RunnableGraph, Sink, Source}
import org.apache.activemq.ActiveMQConnectionFactory
class SourceMerger(
sources: Seq[Source[java.io.Serializable, NotUsed]],
priorities: Seq[Int],
private val sink: Sink[java.io.Serializable, _]
) {
require(sources.size == priorities.size, "Each source should have a priority")
import GraphDSL.Implicits._
private def partial(
sources: Seq[Source[java.io.Serializable, NotUsed]],
priorities: Seq[Int],
sink: Sink[java.io.Serializable, _]
): Graph[ClosedShape, NotUsed] = GraphDSL.create() { implicit b =>
val merge = b.add(MergePrioritized[java.io.Serializable](priorities))
sources.zipWithIndex.foreach { case (s, i) =>
s.shape.out ~> merge.in(i)
}
merge.out ~> sink
ClosedShape
}
def merge(
sources: Seq[Source[java.io.Serializable, NotUsed]],
priorities: Seq[Int],
sink: Sink[java.io.Serializable, _]
): RunnableGraph[NotUsed] = RunnableGraph.fromGraph(partial(sources, priorities, sink))
def run()(implicit mat: Materializer): NotUsed = merge(sources, priorities, sink).run()(mat)
}
However, I get an error when running the following stub:
import akka.actor.ActorSystem
import akka.stream.{ActorMaterializer, Materializer}
import akka.stream.scaladsl.{Sink, Source}
import org.scalatest.{Matchers, WordSpecLike}
import akka.testkit.TestKit
import scala.collection.immutable.Iterable
class SourceMergerSpec extends TestKit(ActorSystem("SourceMerger")) with WordSpecLike with Matchers {
implicit val materializer: Materializer = ActorMaterializer()
"A SourceMerger" should {
"merge by priority" in {
val priorities: Seq[Int] = Seq(1,2,3)
val highPriority = Iterable("message1", "message2", "message3")
val mediumPriority = Iterable("message4", "message5", "message6")
val lowPriority = Iterable("message7", "message8", "message9")
val source1 = Source[String](highPriority)
val source2 = Source[String](mediumPriority)
val source3 = Source[String](lowPriority)
val sources = Seq(source1, source2, source3)
val subscriber = Sink.seq[java.io.Serializable]
val merger = new SourceMerger(sources, priorities, subscriber)
merger.run()
source1.runWith(Sink.foreach(println))
}
}
}
The relevant stacktrace is here:
[StatefulMapConcat.out] is already connected
java.lang.IllegalArgumentException: [StatefulMapConcat.out] is already connected
at akka.stream.scaladsl.GraphDSL$Builder.addEdge(Graph.scala:1304)
at akka.stream.scaladsl.GraphDSL$Implicits$CombinerBase$class.$tilde$greater(Graph.scala:1431)
at akka.stream.scaladsl.GraphDSL$Implicits$PortOpsImpl.$tilde$greater(Graph.scala:1521)
at SourceMerger$$anonfun$partial$1$$anonfun$apply$1.apply(SourceMerger.scala:26)
at SourceMerger$$anonfun$partial$1$$anonfun$apply$1.apply(SourceMerger.scala:25)
It seems that the error comes from this:
sources.zipWithIndex.foreach { case (s, i) =>
s.shape.out ~> merge.in(i)
}
Is it possible to merge an arbitrary number of Sources in Akka streams Graph DSL? If so, why isn't my attempt successful?

Primary Problem with Code Example
One big issue with the code snippet provided in the question is that source1 is connected to the Sink from the merge call and the Sink.foreach(println). The same Source cannot be connected to multiple Sinks without an intermediate fan-out element.
Removing the Sink.foreach(println) may solve your problem outright.
Simplified Design
The merging can be simplified based on the fact that all messages from a particular Source have the same priority. This means that you can sort the sources by their respective priority and then concatenate them all together:
private def partial(sources: Seq[Source[java.io.Serializable, NotUsed]],
priorities: Seq[Int],
sink: Sink[java.io.Serializable, _]): RunnableGraph[NotUsed] =
sources.zip(priorities)
.sortWith(_._2 < _._2)
.map(_._1)
.reduceOption(_ ++ _)
.getOrElse(Source.empty[java.io.Serializable])
.to(sink)

Your code runs without the error if I replace
sources.zipWithIndex.foreach { case (s, i) =>
s.shape.out ~> merge.in(i)
}
with
sources.zipWithIndex.foreach { case (s, i) =>
s ~> merge.in(i)
}
I admit I'm not quite sure why! At any rate, s.shape is a StatefulMapConcat and that's the point where it's complaining about the out port already being connected. The problem occurs even if you only pass a single source, so the arbitrary number isn't the problem.

Related

How to combine data from two kafka topics ZStreams to one ZStream?

import org.slf4j.LoggerFactory
import zio.blocking.Blocking
import zio.clock.Clock
import zio.console.{Console, putStrLn}
import zio.kafka.consumer.{CommittableRecord, Consumer, ConsumerSettings, Subscription}
import zio.kafka.consumer.Consumer.{AutoOffsetStrategy, OffsetRetrieval}
import zio.kafka.serde.Serde
import zio.stream.ZStream
import zio.{ExitCode, Has, URIO, ZIO, ZLayer}
object Test2Topics extends zio.App {
val logger = LoggerFactory.getLogger(this.getClass)
val consumerSettings: ConsumerSettings =
ConsumerSettings(List("localhost:9092"))
.withGroupId(s"consumer-${java.util.UUID.randomUUID().toString}")
.withOffsetRetrieval(OffsetRetrieval.Auto(AutoOffsetStrategy.Earliest))
val consumer: ZLayer[Clock with Blocking, Throwable, Has[Consumer]] =
ZLayer.fromManaged(Consumer.make(consumerSettings))
val streamString: ZStream[Any with Has[Consumer], Throwable, CommittableRecord[String, String]] =
Consumer.subscribeAnd(Subscription.topics("test"))
.plainStream(Serde.string, Serde.string)
val streamInt: ZStream[Any with Has[Consumer], Throwable, CommittableRecord[String, String]] =
Consumer.subscribeAnd(Subscription.topics("topic"))
.plainStream(Serde.string, Serde.string)
val combined = streamString.zipWithLatest(streamInt)((a,b)=>(a,b))
val program = for {
fiber1 <- streamInt.tap(r => putStrLn(s"streamInt: ${r.toString}")).runDrain.forkDaemon
fiber2 <- streamString.tap(r => putStrLn(s"streamString: ${r.toString}")).runDrain.forkDaemon
} yield ZIO.raceAll(fiber1.join, List(fiber2.join))
override def run(args: List[String]): URIO[zio.ZEnv, ExitCode] = {
//combined.tap(r => putStrLn(s"Combined: ${r.toString}")).runDrain.provideSomeLayer(consumer ++ Console.live).exitCode
program.provideSomeLayer(consumer ++ Console.live).exitCode
}
}
Somehow when i try to combine the output from two topics with names test and topic i dont get any output printed out, and also when i try to print both streams in parallel that also doesnt work, but if i print just one stream at a time it works.
Did anyone experience anything like this?
You are composing 1 shared layer that provides one instance of a consumer and initialize this instance twice after eachother to subscribe to 2 topics one after the other.
A single consumer instance should only be initialized once, so the above code will never work.
I believe setting up 2 independent compositions of consumer to stream like this will help:
val program = for {
fiber1 <- streamInt.tap(r => putStrLn(s"streamInt: ${r.toString}")).runDrain.forkDaemon.provideSomeLayer(consumer)
fiber2 <- streamString.tap(r => putStrLn(s"streamString: ${r.toString}")).runDrain.forkDaemon.provideSomeLayer(consumer)
} yield {...}

How to reduce `.via()` down a Seq of custom GraphStage?

I have concrete subclass of GraphStage that defines some custom logic that is influenced by the class parameters.
I would like users of my application to be able to supply a Seq of these custom GraphStages. When building the RunnableGraph I would like to add edges between the Source and the first stage in the Seq, then between each stage in order, and finally the Sink. In other words: src ~> stages.reduce(_ ~> _) ~> sink
Unfortunately this doesn't compile. I think the reason might be related to operator precedence. I tried being more explicit using .via or .foldLeft but I couldn't quite get it right.
This feels like this kind of thing should have a fairly straightforward syntax. Am I missing an operator in the docs? Is this kind of dynamic graph not possible for some reason?
Below is a fabricated example of this pattern using simple stages of String => String. It includes my incompilable code that logically represents the graph I want to express.
import akka.NotUsed
import akka.stream.scaladsl.{GraphDSL, RunnableGraph, Sink, Source}
import akka.stream.stage.{GraphStage, GraphStageLogic}
import akka.stream._
import scala.concurrent.Future
case class MyStage[T](/* ... params ... */) extends GraphStage[FlowShape[T, T]] {
val in = Inlet[T]("MyStage.in")
val out = Outlet[T]("MyStage.out")
val shape: FlowShape[T, T] = FlowShape.of(in, out)
def createLogic(inheritedAttributes: Attributes): GraphStageLogic = ??? // Depends on params
}
case class MyApp(stages: Seq[MyStage[String]]) {
val out = Sink.seq[String]
val graph = RunnableGraph.fromGraph(GraphDSL.create(out) { implicit b: GraphDSL.Builder[Future[Seq[String]]] =>
sink =>
import GraphDSL.Implicits._
val src: Source[String, NotUsed] = Source(Seq("abc", "hello world", "goodbye!"))
// This is what I logically want to do.
src ~> stages.reduce(_ ~> _) ~> sink
ClosedShape
}
}
You can create flow of your stages like this:
val graph = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val stagesShapes = stages.map(b.add(_))
stagesShapes.reduce { (s1, s2) =>
s1 ~> s2
FlowShape(s1.in, s2.out)
}
}
Then all you need is connect source and sink to this flow and run it.

Does Akka Stream Implement the Join Semantic as Kafka Streams Does?

I am quite new to Akka Streams, whereas I have some experience with Kafka Streams.
One thing it seems lacking in Akka Streams is the possibility to join together two different streams.
Kafka Streams allows joining information coming from two different streams (or tables) using the messages' keys.
Is there something similar in Akka Streams?
The short answer is unfortunately no. I would argue that Akka-streams is more low level than Kafka-Stream, Spark Streaming, or Flink. However, you have more control over what you are doing. Basically, it means that you can build your join operator. Check this discussion at lightbend.
Basically, you have to get data from 2 Sources, Merge them and send to a window based on time or number of tuples, compute the join, and emit the data to the Sink. I have done this PoC (which is still unfinished) but I follow the operators that I said to you here, and it is compiling and working. Basically, I still have to join the data inside the window. Currently, I am just emitting them in a mini-batch.
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.{Attributes, ClosedShape, FlowShape, Inlet, Outlet}
import akka.stream.scaladsl.{Flow, GraphDSL, Merge, RunnableGraph, Sink, Source}
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler, OutHandler, TimerGraphStageLogic}
import scala.collection.mutable
import scala.concurrent.duration._
object StreamOpenGraphJoin {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("StreamOpenGraphJoin")
val incrementSource: Source[Int, NotUsed] = Source(1 to 10).throttle(1, 1 second)
val decrementSource: Source[Int, NotUsed] = Source(10 to 20).throttle(1, 1 second)
def tokenizerSource(key: Int) = {
Flow[Int].map { value =>
(key, value)
}
}
// Step 1 - setting up the fundamental for a stream graph
val switchJoinStrategies = RunnableGraph.fromGraph(
GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
// Step 2 - add partition and merge strategy
val tokenizerShape00 = builder.add(tokenizerSource(0))
val tokenizerShape01 = builder.add(tokenizerSource(1))
val mergeTupleShape = builder.add(Merge[(Int, Int)](2))
val batchFlow = Flow.fromGraph(new BatchTimerFlow[(Int, Int)](5 seconds))
val sinkShape = builder.add(Sink.foreach[(Int, Int)](x => println(s" > sink: $x")))
// Step 3 - tying up the components
incrementSource ~> tokenizerShape00 ~> mergeTupleShape.in(0)
decrementSource ~> tokenizerShape01 ~> mergeTupleShape.in(1)
mergeTupleShape.out ~> batchFlow ~> sinkShape
// Step 4 - return the shape
ClosedShape
}
)
// run the graph and materialize it
val graph = switchJoinStrategies.run()
}
// step 0: define the shape
class BatchTimerFlow[T](silencePeriod: FiniteDuration) extends GraphStage[FlowShape[T, T]] {
// step 1: define the ports and the component-specific members
val in = Inlet[T]("BatchTimerFlow.in")
val out = Outlet[T]("BatchTimerFlow.out")
// step 3: create the logic
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new TimerGraphStageLogic(shape) {
// mutable state
val batch = new mutable.Queue[T]
var open = false
// step 4: define mutable state implement my logic here
setHandler(in, new InHandler {
override def onPush(): Unit = {
try {
val nextElement = grab(in)
batch.enqueue(nextElement)
Thread.sleep(50) // simulate an expensive computation
if (open) pull(in) // send demand upstream signal, asking for another element
else {
// forward the element to the downstream operator
emitMultiple(out, batch.dequeueAll(_ => true).to[collection.immutable.Iterable])
open = true
scheduleOnce(None, silencePeriod)
}
} catch {
case e: Throwable => failStage(e)
}
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
override protected def onTimer(timerKey: Any): Unit = {
open = false
}
}
// step 2: construct a new shape
override def shape: FlowShape[T, T] = FlowShape[T, T](in, out)
}
}

akka stream integrating akka-htpp web request call into stream

Getting started with Akka Streams I want to perform a simple computation. Extending the basic QuickStart https://doc.akka.io/docs/akka/2.5/stream/stream-quickstart.html with a call to a restful web api:
val source: Source[Int, NotUsed] = Source(1 to 100)
source.runForeach(println)
already works nicely to print the numbers. But when trying to create an Actor to perform the HTTP request (is this actually necessary?) according to https://doc.akka.io/docs/akka/2.5.5/scala/stream/stream-integrations.html
import akka.pattern.ask
implicit val askTimeout = Timeout(5.seconds)
val words: Source[String, NotUsed] =
Source(List("hello", "hi"))
words
.mapAsync(parallelism = 5)(elem => (ref ? elem).mapTo[String])
// continue processing of the replies from the actor
.map(_.toLowerCase)
.runWith(Sink.ignore)
I cannot get it to compile as the ? operator is not defined. As ar as I know this one would only be defined inside an actor.
I also do not understand yet where exactly inside mapAsync my custom actor needs to be called.
edit
https://blog.colinbreck.com/backoff-and-retry-error-handling-for-akka-streams/ contains at least parts of an example.
It looks like it is not mandatory to create an actor i.e.
implicit val system = ActorSystem()
implicit val ec = system.dispatcher
implicit val materializer = ActorMaterializer()
val source = Source(List("232::03::14062::19965186", "232::03::14062::19965189"))
.map(cellKey => {
val splits = cellKey.split("::")
val mcc = splits(0)
val mnc = splits(1)
val lac = splits(2)
val ci = splits(3)
CellKeySource(cellKey, mcc, mnc, lac, ci)
})
.limit(2)
.mapAsyncUnordered(2)(ck => getResponse(ck.cellKey, ck.mobileCountryCode, ck.mobileNetworkCode, ck.locationArea, ck.cellKey)("<<myToken>>"))
def getResponse(cellKey: String, mobileCountryCode:String, mobileNetworkCode:String, locationArea:String, cellId:String)(token:String): Future[String] = {
RestartSource.withBackoff(
minBackoff = 10.milliseconds,
maxBackoff = 30.seconds,
randomFactor = 0.2,
maxRestarts = 2
) { () =>
val responseFuture: Future[HttpResponse] =
Http().singleRequest(HttpRequest(uri = s"https://www.googleapis.com/geolocation/v1/geolocate?key=${token}", entity = ByteString(
// TODO use proper JSON objects
s"""
|{
| "cellTowers": [
| "mobileCountryCode": $mobileCountryCode,
| "mobileNetworkCode": $mobileNetworkCode,
| "locationAreaCode": $locationArea,
| "cellId": $cellId,
| ]
|}
""".stripMargin)))
Source.fromFuture(responseFuture)
.mapAsync(parallelism = 1) {
case HttpResponse(StatusCodes.OK, _, entity, _) =>
Unmarshal(entity).to[String]
case HttpResponse(statusCode, _, _, _) =>
throw WebRequestException(statusCode.toString() )
}
}
.runWith(Sink.head)
.recover {
case _ => throw StreamFailedAfterMaxRetriesException()
}
}
val done: Future[Done] = source.runForeach(println)
done.onComplete(_ ⇒ system.terminate())
is already the (partial) answer for the question i.e. how to integrate Akka-streams + akka-http. However, it does not work, i.e. only throws error 400s and never terminates.
i think you already found an api how to call akka-http client
regarding your first code snippet which doesn't work. i think there happened some misunderstanding of the example itself. you expected the code in the example to work after just copied. but the intension of the doc was to demonstrate just an example/concept, how you can delegate some long running task out of the stream flow and then consuming the result when it's ready. for this was used ask call to akka actor, because call to ask method returns a Future. probably the authors of the doc just omitted the definition of actor. you can try this one example:
import java.lang.System.exit
import akka.NotUsed
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
import akka.pattern.ask
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import akka.util.Timeout
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.language.higherKinds
object App extends scala.App {
implicit val sys: ActorSystem = ActorSystem()
implicit val mat: ActorMaterializer = ActorMaterializer()
val ref: ActorRef = sys.actorOf(Props[Translator])
implicit val askTimeout: Timeout = Timeout(5.seconds)
val words: Source[String, NotUsed] = Source(List("hello", "hi"))
words
.mapAsync(parallelism = 5)(elem => (ref ? elem).mapTo[String])
.map(_.toLowerCase)
.runWith(Sink.foreach(println))
.onComplete(t => {
println(s"finished: $t")
exit(1)
})
}
class Translator extends Actor {
override def receive: Receive = {
case msg => sender() ! s"$msg!"
}
}
You must import ask pattern from akka.
import akka.pattern.ask
Edit: OK, sorry, I can see that you have already imported. What is ref in your code? ActorRef?

Akka Stream - simple source/sink example inlets and outlets do not correspond

I'm starting to learn Akka Stream. I have a problem that I simplified to this:
import akka.actor.ActorSystem
import akka.stream.{ActorMaterializer, ClosedShape}
import akka.stream.scaladsl.{GraphDSL, RunnableGraph, Sink, Source}
object Test extends App {
val graph = GraphDSL.create() { implicit b =>
val in = Source.fromIterator(() => (1 to 10).iterator.map(_.toDouble))
b.add(in)
val out = Sink.foreach[Double] { d =>
println(s"elem: $d")
}
b.add(out)
in.to(out)
ClosedShape
}
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
val rg = RunnableGraph.fromGraph(graph)
rg.run()
}
This throws a runtime exception:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: The inlets [] and outlets [] must correspond to the inlets [map.in] and outlets [StatefulMapConcat.out]
The problem is, in my actual case I cannot use the ~> operator from GraphDSL.Implicits, because there is no common super-type of Source and Flow (my graph is created from another DSL and not in one single place). So I can only use b.add and in.to(out).
It seems one has to use a special "copy" of the outlet that one gets from builder.add:
val graph = GraphDSL.create() { implicit b =>
val in = Source.fromIterator(() => (1 to 10).iterator.map(_.toDouble))
val out = Sink.foreach[Double] { d =>
println(s"elem: $d")
}
import GraphDSL.Implicits._
val inOutlet = b.add(in).out
// ... pass inOutlet around until ...
inOutlet ~> out
ClosedShape
}