Getting my first steps with akka streams. I have a graph similar to this one copied from here :
val topHeadSink = Sink.head[Int]
val bottomHeadSink = Sink.head[Int]
val sharedDoubler = Flow[Int].map(_ * 2)
val g = RunnableGraph.fromGraph(GraphDSL.create(topHeadSink, bottomHeadSink)((_, _)) { implicit builder =>
(topHS, bottomHS) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.single(1) ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topHS.in
broadcast.out(1) ~> sharedDoubler ~> bottomHS.in
ClosedShape
})
I can run the graph using g.run()
but how can I stop it ?
in what circumstances should I do it (other than the no usage - business wise) ?
This graph is contained within an actor. if the Actor crashes what will happen with the graphs underlying actor ? will it terminate as well ?
As described in the documentation, the way to complete a graph from outside the graph is with KillSwitch. The example that you copied from the documentation is not a good candidate to illustrate this approach, as the source is only a single element, and the stream will complete very quickly when you run it. Let's adjust the graph to more easily see the KillSwitch in action:
val topSink = Sink.foreach(println)
val bottomSink = Sink.foreach(println)
val sharedDoubler = Flow[Int].map(_ * 2)
val killSwitch = KillSwitches.single[Int]
val g = RunnableGraph.fromGraph(GraphDSL.create(topSink, bottomSink, killSwitch)((_, _, _)) {
implicit builder => (topS, bottomS, switch) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.fromIterator(() => (1 to 1000000).iterator) ~> switch ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topS.in
broadcast.out(1) ~> sharedDoubler ~> bottomS.in
ClosedShape
})
val res = g.run // res is of type (Future[Done], Future[Done], UniqueKillSwitch)
Thread.sleep(1000)
res._3.shutdown()
The source now consists of one million elements, and the sinks now print the broadcasted elements. The stream runs for one second, which is not enough time to churn through all one million elements, before we call shutdown to complete the stream.
If you run a stream inside an actor, whether the lifecycle of the underlying actor (or actors) that is created to run the stream is ntied to the lifecycle of the "enclosing" actor depends on how the materializer is created. Read the documentation for more information. The following blog post by Colin Breck about using an actor and KillSwitch to manage the lifecycle of a stream is helpful as well: http://blog.colinbreck.com/integrating-akka-streams-and-akka-actors-part-ii/
There's a KillSwitch feature that should work for you. Check the answer to this other SO question: Proper way to stop Akka Streams on condition
Related
I built a akka graph DSL defining a simple flow. But the flow f4 takes 3 seconds to send an element while f2 takes 10 seconds.
As a result, I got : 3, 2, 3, 2. But, this is not what I want. As f2 takes too much time, I would like to get : 3, 3, 2, 2. Here's the code...
implicit val actorSystem = ActorSystem("NumberSystem")
implicit val materializer = ActorMaterializer()
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val in = Source(List(1, 1))
val out = Sink.foreach(println)
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val yourMapper: Int => Future[Int] = (i: Int) => Future(i + 1)
val yourMapper2: Int => Future[Int] = (i: Int) => Future(i + 2)
val f1, f3 = Flow[Int]
val f2= Flow[Int].throttle(1, 10.second, 0, ThrottleMode.Shaping).mapAsync[Int](2)(yourMapper)
val f4= Flow[Int].throttle(1, 3.second, 0, ThrottleMode.Shaping).mapAsync[Int](2)(yourMapper2)
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
ClosedShape
})
g.run()
So where am I going wrong ? With future or mapAsync ? or else ...
Thanks
Sorry I'm new in akka, so I'm still learning. To get the expected results, one way is to put async :
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val in = Source(List(1, 1))
val out = Sink.foreach(println)
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val yourMapper: Int => Future[Int] = (i: Int) => Future(i + 1)
val yourMapper2: Int => Future[Int] = (i: Int) => Future(i + 2)
val f1, f3 = Flow[Int]
val f2= Flow[Int].throttle(1, 10.second, 0, ThrottleMode.Shaping).map(_+1)
//.mapAsyncUnordered[Int](2)(yourMapper)
val f4= Flow[Int].throttle(1, 3.second, 0, ThrottleMode.Shaping).map(_+2)
//.mapAsync[Int](2)(yourMapper2)
in ~> f1 ~> bcast ~> f2.async ~> merge ~> f3 ~> out
bcast ~> f4.async ~> merge
ClosedShape
})
g.run()
As you've already figured out, replacing:
mapAsync(i => Future{i + delta})
with:
map(_ + delta).async
in the two flows would achieve what you want.
The different result boils down to the key difference between mapAsync and map + async. While mapAsync enables execution of Futures in parallel threads, the multiple mapAsync flow stages are still being managed by the same underlying actor which would carry out operator fusion before execution (for cost efficiency in general).
On the other hand, async actually introduces an asynchronous boundary into the stream flow with the individual flow stages handled by separate actors. In your case, each of the two flow stages independently emits elements downstream and whichever element emitted first gets consumed first. Inevitably there is a cost for managing the stream across the asynchronous boundary and Akka Stream uses a windowed buffering strategy to amortize the cost (see this Akka Stream doc).
For more details re: difference between mapAsync and async, this blog post might be of interest.
So you are trying to join together the results coming out of f2 and f4. In which case you're trying to do what is sometimes called "scatter gather pattern".
I don't think there are off the shelf ways to implement it, without adding a custom stateful stage that will keep track of outputs from f2 and from f4 and emit a record when both are available. But they are some complications to bear in mind:
What happens if a f2/f4 fails
What happens if they take too long
You need to have unique key for each input record, so you know which output from f2 correspond to f4 (or vice versa)
I am struggling with understanding if akka-stream enforces backpressure on Source when having a broadcast with one branch taking a lot of time (asynchronous) in the graph.
I tried buffer and batch to see if there was any backpressure applied on the source but it does not look like it. I also tried flushing System.out but it does not change anything.
object Test extends App {
/* Necessary for akka stream */
implicit val system = ActorSystem("test")
implicit val materializer: ActorMaterializer = ActorMaterializer()
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val in = Source.tick(0 seconds, 1 seconds, 1)
in.runForeach(i => println("Produced " + i))
val out = Sink.foreach(println)
val out2 = Sink.foreach[Int]{ o => println(s"2 $o") }
val bcast = builder.add(Broadcast[Int](2))
val batchedIn: Source[Int, Cancellable] = in.batch(4, identity) {
case (s, v) => println(s"Batched ${s+v}"); s + v
}
val f2 = Flow[Int].map(_ + 10)
val f4 = Flow[Int].map { i => Thread.sleep(2000); i}
batchedIn ~> bcast ~> f2 ~> out
bcast ~> f4.async ~> out2
ClosedShape
})
g.run()
}
I would expect to see "Batched ..." in the console when I am running the program and at some point to have it momentarily stuck because f4 is not fast enough to process the values. At the moment, none of those behave as expected as the numbers are generated continuously and no batch is done.
EDIT: I noticed that after some time, the batch messages start to print out in the console. I still don't know why it does not happen sooner as the backpressure should happen for the first elements
The reason that explains this behavior are internal buffers that are introduced by akka when async boundaries are set.
Buffers for asynchronous operators
internal buffers that are introduced as an optimization when using asynchronous operators.
While pipelining in general increases throughput, in practice there is a cost of passing an element through the asynchronous (and therefore thread crossing) boundary which is significant. To amortize this cost Akka Streams uses a windowed, batching backpressure strategy internally. It is windowed because as opposed to a Stop-And-Wait protocol multiple elements might be “in-flight” concurrently with requests for elements. It is also batching because a new element is not immediately requested once an element has been drained from the window-buffer but multiple elements are requested after multiple elements have been drained. This batching strategy reduces the communication cost of propagating the backpressure signal through the asynchronous boundary.
I understand that this is a toy stream, but if you explain what is your goal I will try to help you.
You need mapAsync instead of async
val g = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import akka.stream.scaladsl.GraphDSL.Implicits._
val in = Source.tick(0 seconds, 1 seconds, 1).map(x => {println(s"Produced ${x}"); x})
val out = Sink.foreach[Int]{ o => println(s"F2 processed $o") }
val out2 = Sink.foreach[Int]{ o => println(s"F4 processed $o") }
val bcast = builder.add(Broadcast[Int](2))
val batchedIn: Source[Int, Cancellable] = in.buffer(4,OverflowStrategy.backpressure)
val f2 = Flow[Int].map(_ + 10)
val f4 = Flow[Int].mapAsync(1) { i => Future { println("F4 Started Processing"); Thread.sleep(2000); i }(system.dispatcher) }
batchedIn ~> bcast ~> f2 ~> out
bcast ~> f4 ~> out2
ClosedShape
}).run()
I'm trying out Akka Stream API and I have no idea why this throws java.lang.IllegalArgumentException: [Partition.in] is already connected in line 5
val graph = RunnableGraph.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
import GraphDSL.Implicits._
val intSource = Source.fromIterator(() => Iterator.continually(Random.nextInt(10).toString))
val validateInput: Flow[String, Message, NotUsed] = Flow[String].map(Message.fromString)
val validationPartitioner = Partition[Message](2, { // #5 error here
case _: Data => 0
case _ => 1
})
val outputStream = Sink.foreach[Message](println(_))
val errorStream = Sink.ignore
intSource ~> validateInput ~> validationPartitioner.in
validationPartitioner.out(0) ~> outputStream
validationPartitioner.out(1) ~> errorStream
ClosedShape
})
but if I change validationPartitioner to be wrapped in builder.add(...) and remove .in from
intSource ~> validateInput ~> validationPartitioner.in
Everything works. If I just remove .in the code doesn't compile. Why usage of builder is being forced and am I missing something or is it a bug?
All of the components of a graph must be added to the builder, but there are variants of the ~> operator that add the most commonly used components, such as Source and Flow, to the builder under the covers (see here and here). However, junction operations that perform a fan-in (such as Merge) or a fan-out (such as Partition) must be explicitly passed to builder.add if you're using the Graph DSL.
The Akka documentation is vast and there are a lot of tutorials. But either they are outdated or they only cover the basics (or, maybe I simply can't find the right ones).
What I want to create is a websocket application with multiple clients and multiple sources on the server side. As I don't want to get over my head from the start, I want to make baby steps and incrementally increase the complexity of the software I am building.
After toying around with some simple flows I wanted to start with a more sophisticated graph now.
What I want is:
Two sources, one that pushes "keepAlive" messages from the server to the client (currently only one) and a second one that actually pushes useful data.
Now for the first one I have this:
val tickingSource: Source[Array[Byte], Cancellable] =
Source.tick(initialDelay = 1 second, interval = 10 seconds, tick = NotUsed)
.zipWithIndex
.map{ case (_, counter) => SomeMessage().toByteArray}
Where SomeMessage is a protobuf type.
Because I can't find an up-to-date way to add an actor as a source, I tried the following for my second source:
val secondSource = Source(1 to 1000)
val secondSourceConverter = Flow[Int].map(x => BigInteger.valueOf(x).toByteArray)
My attempt at the graph:
val g: RunnableGraph[NotUsed] = RunnableGraph.fromGraph(GraphDSL.create()
{
implicit builder =>
import GraphDSL.Implicits._
val sourceMerge = builder.add(Merge[Array[Byte]](2).named("sourceMerge"))
val x = Source(1 to 1000)
val y = Flow[Int].map(x => BigInteger.valueOf(x).toByteArray)
val out = Sink.ignore
tickingSource ~> sourceMerge ~> out
x ~> y ~> sourceMerge
ClosedShape
})
Now g is of type RunnableGraph[NotUsed] while it should be RunnableGraph[Array[Byte]] for my websocket. And I wonder here: am I already doing something completely wrong?
You need to pass the secondSourceConverter into the GraphDSL.create, like the following example taken from their docs. Here they import 2 sinks, but it's the same technique.
RunnableGraph.fromGraph(GraphDSL.create(topHeadSink, bottomHeadSink)((_, _)) { implicit builder =>
(topHS, bottomHS) =>
import GraphDSL.Implicits._
val broadcast = builder.add(Broadcast[Int](2))
Source.single(1) ~> broadcast.in
broadcast.out(0) ~> sharedDoubler ~> topHS.in
broadcast.out(1) ~> sharedDoubler ~> bottomHS.in
ClosedShape
})
Your graph is of type RunnableGraph[NotUsed] because you're using Sink.ignore. And you probably want a RunnableGraph[Future[Array[Byte]]] instead of a RunnableGraph[Array[Byte]]:
val byteSink = Sink.fold[Array[Byte], Array[Byte]](Array[Byte]())(_ ++ _)
val g = RunnableGraph.fromGraph(GraphDSL.create(byteSink) { implicit builder => bSink =>
import GraphDSL.Implicits._
val sourceMerge = builder.add(Merge[Array[Byte]](2))
tickingSource ~> sourceMerge ~> bSink.in
secondSource ~> secondSourceConverter ~> sourceMerge
ClosedShape
})
// RunnableGraph[Future[Array[Byte]]]
I'm not sure how you would like to process incoming messages but here is a simple example. Hope that it'll help you.
path("ws") {
extractUpgradeToWebSocket { upgrade =>
complete {
import scala.concurrent.duration._
val tickSource = Source.tick(1.second, 1.second, TextMessage("ping"))
val messagesSource = Source.queue(10, OverflowStrategy.backpressure)
messagesSource.mapMaterializedValue { queue =>
//do something with out queue
//like myHandler ! RegisterOutQueue(queue)
}
val sink = Sink.ignore
val source = tickSource.merge(messagesSource)
upgrade.handleMessagesWithSinkSource(
inSink = sink,
outSource = source
)
}
}
I'm having problems in getting Publishers and Subscribers out of my flows when using more complicated graphs. My goal is to provide an API of Publishers and Subscribers and run the Akka streaming internally. Here's my first try, which works just fine.
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val flow = subscriberSource.to(someFunctionSink)
//create Reactive Streams Subscriber
val subscriber: Subscriber[Boolean] = flow.run()
//prints true
Source.single(true).to(Sink(subscriber)).run()
But then with a more complicated broadcast graph, I'm unsure as how to get the Subscriber and Publisher objects out? Do I need a partial graph?
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val publisherSink = Sink.publisher[Boolean]
FlowGraph.closed() { implicit builder =>
import FlowGraph.Implicits._
val broadcast = builder.add(Broadcast[Boolean](2))
subscriberSource ~> broadcast.in
broadcast.out(0) ~> someFunctionSink
broadcast.out(1) ~> publisherSink
}.run()
val subscriber: Subscriber[Boolean] = ???
val publisher: Publisher[Boolean] = ???
When you call RunnableGraph.run() the stream is run and the result is the "materialized value" for that run.
In your simple example the materialized value of Source.subscriber[Boolean] is Subscriber[Boolean]. In your complex example you want to combine materialized values of several components of your graph to a materialized value that is a tuple (Subscriber[Boolean], Publisher[Boolean]).
You can do that by passing the components for which you are interested in their materialized values to FlowGraph.closed() and then specify a function to combine the materialized values:
import akka.stream.scaladsl._
import org.reactivestreams._
val subscriberSource = Source.subscriber[Boolean]
val someFunctionSink = Sink.foreach(Console.println)
val publisherSink = Sink.publisher[Boolean]
val graph =
FlowGraph.closed(subscriberSource, publisherSink)(Keep.both) { implicit builder ⇒
(in, out) ⇒
import FlowGraph.Implicits._
val broadcast = builder.add(Broadcast[Boolean](2))
in ~> broadcast.in
broadcast.out(0) ~> someFunctionSink
broadcast.out(1) ~> out
}
val (subscriber: Subscriber[Boolean], publisher: Publisher[Boolean]) = graph.run()
See the Scaladocs for more information about the overloads of FlowGraph.closed.
(Keep.both is short for a function (a, b) => (a, b))