If I have created a RunningGraph in Akka Stream, how can I know (from the outside)
when all nodes are cancelled due to completion?
when all nodes have been stopped due to an error?
I don't think there is a way to do it for an arbitrary graph, but if you have your graph under control, you just need to attach monitoring sinks to the output of each node which can fail or complete (these are nodes which have at least one output), for example:
import akka.actor.Status
// obtain graph parts (this can be done inside the graph building as well)
val source: Source[Int, NotUsed] = ...
val flow: Flow[Int, String, NotUsed] = ...
val sink: Sink[String, NotUsed] = ...
// create monitoring actors
val aggregate = actorSystem.actorOf(Props[Aggregate])
val sourceMonitorActor = actorSystem.actorOf(Props(new Monitor("source", aggregate)))
val flowMonitorActor = actorSystem.actorOf(Props(new Monitor("flow", aggregate)))
// create the graph
val graph = GraphDSL.create() { implicit b =>
import GraphDSL._
val sourceMonitor = b.add(Sink.actorRef(sourceMonitorActor, Status.Success(()))),
val flowMonitor = b.add(Sink.actorRef(flowMonitorActor, Status.Success(())))
val bc1 = b.add(Broadcast[Int](2))
val bc2 = b.add(Broadcast[String](2))
// main flow
source ~> bc1 ~> flow ~> bc2 ~> sink
// monitoring branches
bc1 ~> sourceMonitor
bc2 ~> flowMonitor
ClosedShape
}
// run the graph
RunnableGraph.fromGraph(graph).run()
class Monitor(name: String, aggregate: ActorRef) extends Actor {
override def receive: Receive = {
case Status.Success(_) => aggregate ! s"$name completed successfully"
case Status.Failure(e) => aggregate ! s"$name completed with failure: ${e.getMessage}"
case _ =>
}
}
class Aggregate extends Actor {
override def receive: Receive = {
case s: String => println(s)
}
}
It is also possible to create only one monitoring actor and use it in all monitoring sinks, but in that case you won't be able to differentiate easily between streams which have failed.
And there also is watchTermination() method on sources and flows which allows to materialize a future which terminates together with the flow at this point. I think it may be difficult to use with GraphDSL, but with regular stream methods it could look like this:
import akka.Done
import akka.actor.Status
import akka.pattern.pipe
val monitor = actorSystem.actorOf(Props[Monitor])
source
.watchTermination()((f, _) => f pipeTo monitor)
.via(flow).watchTermination((f, _) => f pipeTo monitor)
.to(sink)
.run()
class Monitor extends Actor {
override def receive: Receive = {
case Done => println("stream completed")
case Status.Failure(e) => println(s"stream failed: ${e.getMessage}")
}
}
You can transform the future before piping its value to the actor to differentiate between streams.
Related
I am quite new to Akka Streams, whereas I have some experience with Kafka Streams.
One thing it seems lacking in Akka Streams is the possibility to join together two different streams.
Kafka Streams allows joining information coming from two different streams (or tables) using the messages' keys.
Is there something similar in Akka Streams?
The short answer is unfortunately no. I would argue that Akka-streams is more low level than Kafka-Stream, Spark Streaming, or Flink. However, you have more control over what you are doing. Basically, it means that you can build your join operator. Check this discussion at lightbend.
Basically, you have to get data from 2 Sources, Merge them and send to a window based on time or number of tuples, compute the join, and emit the data to the Sink. I have done this PoC (which is still unfinished) but I follow the operators that I said to you here, and it is compiling and working. Basically, I still have to join the data inside the window. Currently, I am just emitting them in a mini-batch.
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.{Attributes, ClosedShape, FlowShape, Inlet, Outlet}
import akka.stream.scaladsl.{Flow, GraphDSL, Merge, RunnableGraph, Sink, Source}
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler, OutHandler, TimerGraphStageLogic}
import scala.collection.mutable
import scala.concurrent.duration._
object StreamOpenGraphJoin {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("StreamOpenGraphJoin")
val incrementSource: Source[Int, NotUsed] = Source(1 to 10).throttle(1, 1 second)
val decrementSource: Source[Int, NotUsed] = Source(10 to 20).throttle(1, 1 second)
def tokenizerSource(key: Int) = {
Flow[Int].map { value =>
(key, value)
}
}
// Step 1 - setting up the fundamental for a stream graph
val switchJoinStrategies = RunnableGraph.fromGraph(
GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
// Step 2 - add partition and merge strategy
val tokenizerShape00 = builder.add(tokenizerSource(0))
val tokenizerShape01 = builder.add(tokenizerSource(1))
val mergeTupleShape = builder.add(Merge[(Int, Int)](2))
val batchFlow = Flow.fromGraph(new BatchTimerFlow[(Int, Int)](5 seconds))
val sinkShape = builder.add(Sink.foreach[(Int, Int)](x => println(s" > sink: $x")))
// Step 3 - tying up the components
incrementSource ~> tokenizerShape00 ~> mergeTupleShape.in(0)
decrementSource ~> tokenizerShape01 ~> mergeTupleShape.in(1)
mergeTupleShape.out ~> batchFlow ~> sinkShape
// Step 4 - return the shape
ClosedShape
}
)
// run the graph and materialize it
val graph = switchJoinStrategies.run()
}
// step 0: define the shape
class BatchTimerFlow[T](silencePeriod: FiniteDuration) extends GraphStage[FlowShape[T, T]] {
// step 1: define the ports and the component-specific members
val in = Inlet[T]("BatchTimerFlow.in")
val out = Outlet[T]("BatchTimerFlow.out")
// step 3: create the logic
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new TimerGraphStageLogic(shape) {
// mutable state
val batch = new mutable.Queue[T]
var open = false
// step 4: define mutable state implement my logic here
setHandler(in, new InHandler {
override def onPush(): Unit = {
try {
val nextElement = grab(in)
batch.enqueue(nextElement)
Thread.sleep(50) // simulate an expensive computation
if (open) pull(in) // send demand upstream signal, asking for another element
else {
// forward the element to the downstream operator
emitMultiple(out, batch.dequeueAll(_ => true).to[collection.immutable.Iterable])
open = true
scheduleOnce(None, silencePeriod)
}
} catch {
case e: Throwable => failStage(e)
}
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
override protected def onTimer(timerKey: Any): Unit = {
open = false
}
}
// step 2: construct a new shape
override def shape: FlowShape[T, T] = FlowShape[T, T](in, out)
}
}
I have a graph that accepts a sequence of files, processes them one by one ant then at the end of the execution, the program should return success (0) or failure (-1) if all the executions have succeeded or failed.
How could this last step be achieved? How could the Sink know when it is receiving the result for the last file?
val graph = createGraph("path-to-list-of-files")
val result = graph.run()
def createGraph(fileOrPath: String): RunnableGraph[NotUsed] = {
printStage("PREPARING") {
val producer: Source[ProducerFile, NotUsed] = Producer(fileOrPath).toSource()
val validator: Flow[ProducerFile, ProducerFile, NotUsed] = Validator().toFlow()
val provisioner: Flow[ProducerFile, PrivisionerResult, NotUsed] = Provisioner().toFlow()
val executor: Flow[PrivisionerResult, ExecutorResult, NotUsed] = Executor().toFlow()
val evaluator: Flow[ExecutorResult, EvaluatorResult, NotUsed] = Evaluator().toFlow()
val reporter: Sink[EvaluatorResult, Future[Done]] = Reporter().toSink()
val graphResult = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
producer ~> validator ~> provisioner ~> executor ~> evaluator ~> reporter
ClosedShape
})
printLine("The graph pipeline was created")
graphResult
}
Your reporter Sink already materializes to a Future[Done], which you can hook to if you want to run some code when all your elements have processed.
However, at the moment you are not exposing it in your graph. Although there is a way to expose it using the graph DSL, in your case it is even easier to use the fluent DSL to achieve this:
val graphResult: RunnableGraph[Future[Done]] = producer
.via(validator)
.via(provisioner)
.via(executor)
.via(evaluator)
.toMat(reporter)(Keep.right)
This will give you back the Future[Done] when you run your graph
val result: Future[Done] = graph.run()
which then you can hook to - e.g.
result.onComplete {
case Success(_) => println("Success!")
case Failure(_) => println("Failure..")
}
I have Akka Stream flow which is reading from file using alpakka, processing data and write into a file. I want to stop flow after processed n elements, count the time of duration and call system terminate. How can I achieve it?
My flow looks like that:
val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
sourceFile ~> mainFlow ~> sinkFile
ClosedShape
})
graph.run()
Do you have an idea? Thanks
Agreeing with what #Viktor said, first of all you don't need to use the graphDSL to achieve this, and you can use take(n) to complete the graph.
Secondly, you can use mapMaterializedValue to attach a callback to your Sink materialized value (which in turn should materializes to a Future[Something]).
val graph: RunnableGraph[Future[FiniteDuration]] =
sourceFile
.via(mainFlow)
.take(N)
.toMat(sinkFile)(Keep.right)
.mapMaterializedValue { f ⇒
val start = System.nanoTime()
f.map(_ ⇒ FiniteDuration(System.nanoTime() - start, TimeUnit.NANOSECONDS))
}
graph.run().onComplete { duration ⇒
println(s"Elapsed time: $duration")
}
Note that you are going to need an ExecutionContext in scope.
EDIT
Even if you have to use the graphDSL, the same concepts apply. You only need to expose the materialized Future of your sink and map on that.
val graph: RunnableGraph[Future[??Something??]] =
RunnableGraph.fromGraph(GraphDSL.create(sinkFile) { implicit builder: GraphDSL.Builder[Future[Something]] => snk =>
import GraphDSL.Implicits._
sourceFile ~> mainFlow ~> snk
ClosedShape
})
val timedGraph: RunnableGraph[Future[FiniteDuration]] =
graph.mapMaterializedValue { f ⇒
val start = System.nanoTime()
f.map(_ ⇒ FiniteDuration(System.nanoTime() - start, TimeUnit.NANOSECONDS))
}
timedGraph.run().onComplete { duration ⇒
println(s"Elapsed time: $duration")
}
No need for GraphDSL here.
val doneFuture = (sourceFile via mainFlow.take(N) runWith sinkFile) transformWith { _ => system.terminate() }
To obtain time, you can use akka-streams-contrib: https://github.com/akka/akka-stream-contrib/blob/master/contrib/src/main/scala/akka/stream/contrib/Timed.scala
I have built an akka graph that defines a flow. My objective is to reformat my future response and save it to a file. The flow can be outlined bellow:
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val balancer = builder.add(Balance[(HttpRequest, String)](6, waitForAllDownstreams = false))
val merger = builder.add(Merge[Future[Map[String, String]]](6))
val fileSink = FileIO.toPath(outputPath, options)
val ignoreSink = Sink.ignore
val in = Source(seeds)
in ~> balancer.in
for (i <- Range(0,6)) {
balancer.out(i) ~>
wikiFlow.async ~>
// This maps to a Future[Map[String, String]]
Flow[(Try[HttpResponse], String)].map(parseHtml) ~>
merger
}
merger.out ~>
// When we merge we need to map our Map to a file
Flow[Future[Map[String, String]]].map((d) => {
// What is the proper way of serializing future map
// so I can work with it like a normal stream into fileSink?
// I could manually do ->
// d.foreach(someWriteToFileProcess(_))
// with ignoreSink, but this defeats the nice
// akka flow
}) ~>
fileSink
ClosedShape
})
I can hack this workflow to write my future map to a file via foreach, but I'm afraid this could somehow lead to concurrency issues with FileIO and it just doesn't feel right. What is the proper way to handle futures with our akka flow?
The easiest way to create a Flow which involves an asynchronous computation is by using mapAsync.
So... lets say you want to create a Flow which consumes Int and produces String using an asynchronous computation mapper: Int => Future[String] with a parallelism of 5.
val mapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](5)(mapper)
Now, you can use this flow in your graph however you want.
An example usage will be,
val graph = GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
val intSource = Source(1 to 10)
val printSink = Sink.foreach[String](s => println(s))
val yourMapper: Int => Future[String] = (i: Int) => Future(i.toString)
val yourFlow = Flow[Int].mapAsync[String](2)(yourMapper)
intSource ~> yourFlow ~> printSink
ClosedShape
}
I created an akka stream that has a process function and an error handler function passed into it. The Source and Sink are fully encapsulated in a ClosedShape RunnableFlow. My intention is that I would pass an item to the parent class and run it through the flow. This all seems to work until I get to testing. I'm using scala-test and passing appending to Lists inside the process function and error handler function. I'm randomly generating errors to see things flow to the error handler function as well. The problem is if I pass 100 items to the parent class then I would expect the list of items in the error function and the list of items in the process function to add up to 100. Since the Source and Sink are fully encapsulated I don't have a clear way to tell the test to wait and it gets to the assertion/should statements before all items have processed through the stream. I've created this gist to describe the stream.
Here's an example test for the above gist:
import akka.actor._
import akka.stream._
import akka.testkit._
import org.scalatest._
class TestSpec(_system: ActorSystem) extends TestKit(_system) with ImplicitSender
with WordSpecLike with Matchers with BeforeAndAfterAll {
def this() = this(ActorSystem("TestSpec"))
override def afterAll = {
Thread.sleep(500)
mat.shutdown()
TestKit.shutdownActorSystem(system)
}
implicit val mat = ActorMaterializer(ActorMaterializerSettings(system).withDebugLogging(true).withFuzzing(true))
"TestSpec" must {
"handle messages" in {
val testStream = new Testing() // For Testing class see gist: https://gist.github.com/leftofnull/3e4c2a6b18fe71d219b6
(1 to 100).map(n => testStream.processString(s"${n}${n * 2}${n * 4}${n * 8}")) // Give it 100 strings to chew on
testStream.errors.size should not be (0) // passes
testStream.processed.size should not be (0) // passes
(testStream.processed.size + testStream.errors.size) should be (100) // fails due to checking before all items are processed
}
}
}
Per the comment from Viktor Klang on the linked Gist. This proves to be a great solution:
def consume(
errorHandler: BadData => Unit, fn: Data => Unit, a: String
): RunnableGraph[Future[akka.Done]] = RunnableGraph.fromGraph(
GraphDSL.create(Sink.foreach[BadData](errorHandler)) { implicit b: GraphDSL.Builder[Unit] => sink =>
import GraphDSL.Implicits._
val source = b.add(Source.single(a))
val broadcast = b.add(Broadcast[String](2))
val merge = b.add(Zip[String, String])
val process = new ProcessorFlow(fn)
val failed = b.add(Flow[Xor[BadData, Data]].filter(x => x.isLeft))
val errors = b.add(new LeftFlow[Xor[BadData, Data], BadData](
(input: Xor[BadData, Data]) =>
input.swap.getOrElse((new Throwable, ("", "")))
))
source ~> broadcast.in
broadcast.out(0) ~> Flow[String].map(_.reverse) ~> merge.in0
broadcast.out(1) ~> Flow[String].map("| " + _ + " |") ~> merge.in1
merge.out ~> process ~> failed ~> errors ~> sink
ClosedShape
}
)
This allows me to Await.result on the RunnableGraph for testing purposes. Thanks again to Viktor for this solution!