I'm quite new to using ZIO. I'm currently writing a crypto trading bot in Scala and I am trying to learn ZIO at the same time. Now I'm opening a websocket, this websocket gives multiple callbacks until it is closed which I'm struggling to integrate in my code. My current code:
object Main extends zio.App with Logging {
def run(args: List[String]): URIO[Any with Console, ExitCode] = Configuration.getConfiguration.fold(onError, start).exitCode
private val interval: CandlestickInterval = CandlestickInterval.ONE_MINUTE
private def onError(exception: ConfigurationException): ZIO[Any, Throwable, Unit] = {
logger.info("Could not initialize traderbot!")
logger.error(exception.getMessage)
IO.succeed()
}
private final def start(configuration: Configuration): ZIO[Any, Throwable, Unit] = {
for {
binanceClient <- IO.succeed(BinanceApiClientFactory.newInstance(configuration.apiKey, configuration.secret))
webSocketClient <- IO.succeed(binanceClient.newWebSocketClient())
candlesticks <- Task.effectAsync[CandlestickEvent] {
callback =>
webSocketClient.onCandlestickEvent(
"adaeur",
interval, d => callback(IO.succeed(d))
)
})
// TODO Calculate RSI from candlesticks.
} yield candlesticks
}
}
I would like keep receiving candlestick events and keep things functional. I saw a few things about Zio Streams, but I can't find examples that deal with recurring callbacks and are simple to understand. Now I can't use my Candlestick code to in de for comprehension.
Thanks for your time!
Unfortunately, ZIO can't handle multiple callbacks when using effectAsync since the data type is based on a single success or failure value.
You can use ZStream instead though which has a similarly shaped operator which can be called multiple times:
private final def start(configuration: Configuration): ZStream[Any, Throwable, Unit] = {
val candlesticks = ZStream.unwrap(
IO.effectTotal {
val client = BinanceApiClientFactory
.newInstance(configuration.apiKey, configuration.secret)
.newWebSocketClient()
// This variant accepts a return value in the `Left` which
// is called when during shutdown to make sure that the websocket is
// cleaned up
ZStream.effectAsyncInterrupt { cb =>
val closeable = webSocketClient.onCancelstickEvent(
"adaeur",
interval,
d => cb(IO.succeed(d)
)
Left(UIO(closeable.close()))
}
)
for {
candlestick <- candlesticks
// TODO Calculate RSI from candlesticks.
} yield ()
}
Related
I am creating in Scala and Cats a function that does some I/O and that will be called by other parts of the code. I'm also learning Cats and I want my function to:
Be generic in its effect and use a F[_]
Run on a dedicated thread pool
I want to introduce async boundaries
I assume that all my functions are generic in F[_] up to the main method because I'm trying to follow these Cat's guidelines
But I struggle to make these constraint to work by using ContextShift or ExecutionContext. I have written a full example here and this is an exctract from the example:
object ComplexOperation {
// Thread pool for ComplexOperation internal use only
val cs = IO.contextShift(
ExecutionContext.fromExecutor(Executors.newSingleThreadExecutor())
)
// Complex operation that takes resources and time
def run[F[_]: Sync](input: String): F[String] =
for {
r1 <- Sync[F].delay(cs.shift) *> op1(input)
r2 <- Sync[F].delay(cs.shift) *> op2(r1)
r3 <- Sync[F].delay(cs.shift) *> op3(r2)
} yield r3
def op1[F[_]: Sync](input: String): F[Int] = Sync[F].delay(input.length)
def op2[F[_]: Sync](input: Int): F[Boolean] = Sync[F].delay(input % 2 == 0)
def op3[F[_]: Sync](input: Boolean): F[String] = Sync[F].delay(s"Complex result: $input")
}
This clearly doesn't abstract over effects as ComplexOperation.run needs a ContextShift[IO] to be able to introduce async boundaries. What is the right (or best) way of doing this?
Creating ContextShift[IO] inside ComplexOperation.run makes the function depend on IO which I don't want.
Moving the creation of a ContextShift[IO] on the caller will simply shift the problem: the caller is also generic in F[_] so how does it obtain a ContextShift[IO] to pass to ComplexOperation.run without explicitly depending on IO?
Remember that I don't want to use one global ContextShift[IO] defined at the topmost level but I want each component to decide for itself.
Should my ComplexOperation.run create the ContextShift[IO] or is it the responsibility of the caller?
Am I doing this right at least? Or am I going against standard practices?
So I took the liberty to rewrite your code, hope it helps:
import cats.effect._
object Functions {
def sampleFunction[F[_]: Sync : ContextShift](file: String, blocker: Blocker): F[String] = {
val handler: Resource[F, Int] =
Resource.make(
blocker.blockOn(openFile(file))
) { handler =>
blocker.blockOn(closeFile(handler))
}
handler.use(handler => doWork(handler))
}
private def openFile[F[_]: Sync](file: String): F[Int] = Sync[F].delay {
println(s"Opening file $file with handler 2")
2
}
private def closeFile[F[_]: Sync](handler: Int): F[Unit] = Sync[F].delay {
println(s"Closing file handler $handler")
}
private def doWork[F[_]: Sync](handler: Int): F[String] = Sync[F].delay {
println(s"Calculating the value on file handler $handler")
"The final value"
}
}
object Main extends IOApp {
override def run(args: List[String]): IO[ExitCode] = {
val result = Blocker[IO].use { blocker =>
Functions.sampleFunction[IO](file = "filePath", blocker)
}
for {
data <- result
_ <- IO(println(data))
} yield ExitCode.Success
}
}
You can see it running here.
So, what does this code does.
First, it creates a Resource for the file, since close has to be done, even on guarantee or on failure.
It is using Blocker to run the open and close operations on a blocking thread poo (that is done using ContextShift).
Finally, on the main, it creates a default Blocker for instance, for **IO*, and uses it to call your function; and prints the result.
Fell free to ask any question.
I have an external (that is, I cannot change it) Java API which looks like this:
public interface Sender {
void send(Event e);
}
I need to implement a Sender which accepts each event, transforms it to a JSON object, collects some number of them into a single bundle and sends over HTTP to some endpoint. This all should be done asynchronously, without send() blocking the calling thread, with some fixed-size buffer and dropping new events if the buffer is full.
With akka-streams this is quite simple: I create a graph of stages (which uses akka-http to send HTTP requests), materialize it and use the materialized ActorRef to push new events to the stream:
lazy val eventPipeline = Source.actorRef[Event](Int.MaxValue, OverflowStrategy.fail)
.via(CustomBuffer(bufferSize)) // buffer all events
.groupedWithin(batchSize, flushDuration) // group events into chunks
.map(toBundle) // convert each chunk into a JSON message
.mapAsyncUnordered(1)(sendHttpRequest) // send an HTTP request
.toMat(Sink.foreach { response =>
// print HTTP response for debugging
})(Keep.both)
lazy val (eventsActor, completeFuture) = eventPipeline.run()
override def send(e: Event): Unit = {
eventsActor ! e
}
Here CustomBuffer is a custom GraphStage which is very similar to the library-provided Buffer but tailored to our specific needs; it probably does not matter for this particular question.
As you can see, interacting with the stream from non-stream code is very simple - the ! method on the ActorRef trait is asynchronous and does not need any additional machinery to be called. Each event which is sent to the actor is then processed through the entire reactive pipeline. Moreover, because of how akka-http is implemented, I even get connection pooling for free, so no more than one connection is opened to the server.
However, I cannot find a way to do the same thing with FS2 properly. Even discarding the question of buffering (I will probably need to write a custom Pipe implementation which does additional things that we need) and HTTP connection pooling, I'm still stuck with a more basic thing - that is, how to push the data to the reactive stream "from outside".
All tutorials and documentation that I can find assume that the entire program happens inside some effect context, usually IO. This is not my case - the send() method is invoked by the Java library at unspecified times. Therefore, I just cannot keep everything inside one IO action, I necessarily have to finalize the "push" action inside the send() method, and have the reactive stream as a separate entity, because I want to aggregate events and hopefully pool HTTP connections (which I believe is naturally tied to the reactive stream).
I assume that I need some additional data structure, like Queue. fs2 does indeed have some kind of fs2.concurrent.Queue, but again, all documentation shows how to use it inside a single IO context, so I assume that doing something like
val queue: Queue[IO, Event] = Queue.unbounded[IO, Event].unsafeRunSync()
and then using queue inside the stream definition and then separately inside the send() method with further unsafeRun calls:
val eventPipeline = queue.dequeue
.through(customBuffer(bufferSize))
.groupWithin(batchSize, flushDuration)
.map(toBundle)
.mapAsyncUnordered(1)(sendRequest)
.evalTap(response => ...)
.compile
.drain
eventPipeline.unsafeRunAsync(...) // or something
override def send(e: Event) {
queue.enqueue(e).unsafeRunSync()
}
is not the correct way and most likely would not even work.
So, my question is, how do I properly use fs2 to solve my problem?
Consider the following example:
import cats.implicits._
import cats.effect._
import cats.effect.implicits._
import fs2._
import fs2.concurrent.Queue
import scala.concurrent.ExecutionContext
import scala.concurrent.duration._
object Answer {
type Event = String
trait Sender {
def send(event: Event): Unit
}
def main(args: Array[String]): Unit = {
val sender: Sender = {
val ec = ExecutionContext.global
implicit val cs: ContextShift[IO] = IO.contextShift(ec)
implicit val timer: Timer[IO] = IO.timer(ec)
fs2Sender[IO](2)
}
val events = List("a", "b", "c", "d")
events.foreach { evt => new Thread(() => sender.send(evt)).start() }
Thread sleep 3000
}
def fs2Sender[F[_]: Timer : ContextShift](maxBufferedSize: Int)(implicit F: ConcurrentEffect[F]): Sender = {
// dummy impl
// this is where the actual logic for batching
// and shipping over the network would live
val consume: Pipe[F, Event, Unit] = _.evalMap { event =>
for {
_ <- F.delay { println(s"consuming [$event]...") }
_ <- Timer[F].sleep(1.seconds)
_ <- F.delay { println(s"...[$event] consumed") }
} yield ()
}
val suspended = for {
q <- Queue.bounded[F, Event](maxBufferedSize)
_ <- q.dequeue.through(consume).compile.drain.start
sender <- F.delay[Sender] { evt =>
val enqueue = for {
wasEnqueued <- q.offer1(evt)
_ <- F.delay { println(s"[$evt] enqueued? $wasEnqueued") }
} yield ()
enqueue.toIO.unsafeRunAsyncAndForget()
}
} yield sender
suspended.toIO.unsafeRunSync()
}
}
The main idea is to use a concurrent Queue from fs2. Note, that the above code demonstrates that neither the Sender interface nor the logic in main can be changed. Only an implementation of the Sender interface can be swapped out.
I don't have much experience with exactly that library but it should look somehow like that:
import cats.effect.{ExitCode, IO, IOApp}
import fs2.concurrent.Queue
case class Event(id: Int)
class JavaProducer{
new Thread(new Runnable {
override def run(): Unit = {
var id = 0
while(true){
Thread.sleep(1000)
id += 1
send(Event(id))
}
}
}).start()
def send(event: Event): Unit ={
println(s"Original producer prints $event")
}
}
class HackedProducer(queue: Queue[IO, Event]) extends JavaProducer {
override def send(event: Event): Unit = {
println(s"Hacked producer pushes $event")
queue.enqueue1(event).unsafeRunSync()
println(s"Hacked producer pushes $event - Pushed")
}
}
object Test extends IOApp{
override def run(args: List[String]): IO[ExitCode] = {
val x: IO[Unit] = for {
queue <- Queue.unbounded[IO, Event]
_ = new HackedProducer(queue)
done <- queue.dequeue.map(ev => {
println(s"Got $ev")
}).compile.drain
} yield done
x.map(_ => ExitCode.Success)
}
}
We can create a bounded queue that will consume elements from sender and make them available to fs2 stream processing.
import cats.effect.IO
import cats.effect.std.Queue
import fs2.Stream
trait Sender[T]:
def send(e: T): Unit
object Sender:
def apply[T](bufferSize: Int): IO[(Sender[T], Stream[IO, T])] =
for
q <- Queue.bounded[IO, T](bufferSize)
yield
val sender: Sender[T] = (e: T) => q.offer(e).unsafeRunSync()
def stm: Stream[IO, T] = Stream.eval(q.take) ++ stm
(sender, stm)
Then we'll have two ends - one for Java worlds, to send new elements to Sender. Another one - for stream processing in fs2.
class TestSenderQueue:
#Test def testSenderQueue: Unit =
val (sender, stream) = Sender[Int](1)
.unsafeRunSync()// we have to run it preliminary to make `sender` available to external system
val processing =
stream
.map(i => i * i)
.evalMap{ ii => IO{ println(ii)}}
sender.send(1)
processing.compile.toList.start//NB! we start processing in a separate fiber
.unsafeRunSync() // immediately right now.
sender.send(2)
Thread.sleep(100)
(0 until 100).foreach(sender.send)
println("finished")
Note that we push data in the current thread and have to run fs2 in a separate thread (.start).
Within an akka stream stage FlowShape[A, B] , part of the processing I need to do on the A's is to save/query a datastore with a query built with A data. But that datastore driver query gives me a future, and I am not sure how best to deal with it (my main question here).
case class Obj(a: String, b: Int, c: String)
case class Foo(myobject: Obj, name: String)
case class Bar(st: String)
//
class SaveAndGetId extends GraphStage[FlowShape[Foo, Bar]] {
val dao = new DbDao // some dao with an async driver
override def createLogic(inheritedAttributes: Attributes) = new GraphStageLogic(shape) {
setHandlers(in, out, new InHandler with Outhandler {
override def onPush() = {
val foo = grab(in)
val add = foo.record.value()
val result: Future[String] = dao.saveAndGetRecord(add.myobject)//saves and returns id as string
//the naive approach
val record = Await(result, Duration.inf)
push(out, Bar(record))// ***tests pass every time
//mapping the future approach
result.map { x=>
push(out, Bar(x))
} //***tests fail every time
The next stage depends on the id of the db record returned from query, but I want to avoid Await. I am not sure why mapping approach fails:
"it should work" in {
val source = Source.single(Foo(Obj("hello", 1, "world")))
val probe = source
.via(new SaveAndGetId))
.runWith(TestSink.probe)
probe
.request(1)
.expectBarwithId("one")//say we know this will be
.expectComplete()
}
private implicit class RichTestProbe(probe: Probe[Bar]) {
def expectBarwithId(expected: String): Probe[Bar] =
probe.expectNextChainingPF{
case r # Bar(str) if str == expected => r
}
}
When run with mapping future, I get failure:
should work ***FAILED***
java.lang.AssertionError: assertion failed: expected: message matching partial function but got unexpected message OnComplete
at scala.Predef$.assert(Predef.scala:170)
at akka.testkit.TestKitBase$class.expectMsgPF(TestKit.scala:406)
at akka.testkit.TestKit.expectMsgPF(TestKit.scala:814)
at akka.stream.testkit.TestSubscriber$ManualProbe.expectEventPF(StreamTestKit.scala:570)
The async side channels example in the docs has the future in the constructor of the stage, as opposed to building the future within the stage, so doesn't seem to apply to my case.
I agree with Ramon. Constructing a new FlowShapeis not necessary in this case and it is too complicated. It is very much convenient to use mapAsync method here:
Here is a code snippet to utilize mapAsync:
import akka.stream.scaladsl.{Sink, Source}
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object MapAsyncExample {
val numOfParallelism: Int = 10
def main(args: Array[String]): Unit = {
Source.repeat(5)
.mapAsync(parallelism)(x => asyncSquare(x))
.runWith(Sink.foreach(println)) previous stage
}
//This method returns a Future
//You can replace this part with your database operations
def asyncSquare(value: Int): Future[Int] = Future {
value * value
}
}
In the snippet above, Source.repeat(5) is a dummy source to emit 5 indefinitely. There is a sample function asyncSquare which takes an integer and calculates its square in a Future. .mapAsync(parallelism)(x => asyncSquare(x)) line uses that function and emits the output of Future to the next stage. In this snipet, the next stage is a sink which prints every item.
parallelism is the maximum number of asyncSquare calls that can run concurrently.
I think your GraphStage is unnecessarily overcomplicated. The below Flow performs the same actions without the need to write a custom stage:
val dao = new DbDao
val parallelism = 10 //number of parallel db queries
val SaveAndGetId : Flow[Foo, Bar, _] =
Flow[Foo]
.map(foo => foo.record.value().myobject)
.mapAsync(parallelism)(rec => dao.saveAndGetRecord(rec))
.map(Bar.apply)
I generally try to treat GraphStage as a last resort, there is almost always an idiomatic way of getting the same Flow by using the methods provided by the akka-stream library.
I am a newbie to scala futures and I have a doubt regarding the return value of scala futures.
So, generally syntax for a scala future is
def downloadPage(url: URL) = Future[List[Int]] {
}
I want to know how to access the List[Int] from some other method which calls this method.
In other words,
val result = downloadPage("localhost")
then what should be the approach to get List[Int] out of the future ?
I have tried using map method but not able to do this successfully.`
The case of Success(listInt) => I want to return the listInt and I am not able to figure out how to do that.
The best practice is that you don't return the value. Instead you just pass the future (or a version transformed with map, flatMap, etc.) to everyone who needs this value and they can add their own onComplete.
If you really need to return it (e.g. when implementing a legacy method), then the only thing you can do is to block (e.g. with Await.result) and you need to decide how long to await.
You need to wait for the future to complete to get the result given some timespan, here's something that would work:
import scala.concurrent.duration._
def downloadPage(url: URL) = Future[List[Int]] {
List(1,2,3)
}
val result = downloadPage("localhost")
val myListInt = result.result(10 seconds)
Ideally, if you're using a Future, you don't want to block the executing thread, so you would move your logic that deals with the result of your Future into the onComplete method, something like this:
result.onComplete({
case Success(listInt) => {
//Do something with my list
}
case Failure(exception) => {
//Do something with my error
}
})
I hope you already solved this since it was asked in 2013 but maybe my answer can help someone else:
If you are using Play Framework, it support async Actions (actually all Actions are async inside). An easy way to create an async Action is using Action.async(). You need to provide a Future[Result]to this function.
Now you can just make transformations from your Future[List[Int]] to Future[Result] using Scala's map, flatMap, for-comprehension or async/await. Here an example from Play Framework documentation.
import play.api.libs.concurrent.Execution.Implicits.defaultContext
def index = Action.async {
val futureInt = scala.concurrent.Future { intensiveComputation() }
futureInt.map(i => Ok("Got result: " + i))
}
You can do something like that. If The wait time that is given in Await.result method is less than it takes the awaitable to execute, you will have a TimeoutException, and you need to handle the error (or any other error).
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{Try, Success, Failure}
import scala.concurrent.duration._
object MyObject {
def main(args: Array[String]) {
val myVal: Future[String] = Future { silly() }
// values less than 5 seconds will go to
// Failure case, because silly() will not be done yet
Try(Await.result(myVal, 10 seconds)) match {
case Success(extractedVal) => { println("Success Happened: " + extractedVal) }
case Failure(_) => { println("Failure Happened") }
case _ => { println("Very Strange") }
}
}
def silly(): String = {
Thread.sleep(5000)
"Hello from silly"
}
}
The best way I’ve found to think of a Future is a box that will, at some point, contain the thing that you want. The key thing with a Future is that you never open the box. Trying to force open the box will lead you to blocking and grief. Instead, you put the Future in another, larger box, typically using the map method.
Here’s an example of a Future that contains a String. When the Future completes, then Console.println is called:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
object Main {
def main(args:Array[String]) : Unit = {
val stringFuture: Future[String] = Future.successful("hello world!")
stringFuture.map {
someString =>
// if you use .foreach you avoid creating an extra Future, but we are proving
// the concept here...
Console.println(someString)
}
}
}
Note that in this case, we’re calling the main method and then… finishing. The string’s Future, provided by the global ExecutionContext, does the work of calling Console.println. This is great, because when we give up control over when someString is going to be there and when Console.println is going to be called, we let the system manage itself. In constrast, look what happens when we try to force the box open:
val stringFuture: Future[String] = Future.successful("hello world!")
val someString = Future.await(stringFuture)
In this case, we have to wait — keep a thread twiddling its thumbs — until we get someString back. We’ve opened the box, but we’ve had to commandeer the system’s resources to get at it.
It wasn't yet mentioned, so I want to emphasize the point of using Future with for-comprehension and the difference of sequential and parallel execution.
For example, for sequential execution:
object FuturesSequential extends App {
def job(n: Int) = Future {
Thread.sleep(1000)
println(s"Job $n")
}
val f = for {
f1 <- job(1)
f2 <- job(2)
f3 <- job(3)
f4 <- job(4)
f5 <- job(5)
} yield List(f1, f2, f3, f4, f5)
f.map(res => println(s"Done. ${res.size} jobs run"))
Thread.sleep(6000) // We need to prevent main thread from quitting too early
}
And for parallel execution (note that the Future are before the for-comprehension):
object FuturesParallel extends App {
def job(n: Int) = Future {
Thread.sleep(1000)
println(s"Job $n")
}
val j1 = job(1)
val j2 = job(2)
val j3 = job(3)
val j4 = job(4)
val j5 = job(5)
val f = for {
f1 <- j1
f2 <- j2
f3 <- j3
f4 <- j4
f5 <- j5
} yield List(f1, f2, f3, f4, f5)
f.map(res => println(s"Done. ${res.size} jobs run"))
Thread.sleep(6000) // We need to prevent main thread from quitting too early
}
It's possible to create sources and sinks from actors using Source.actorPublisher() and Sink.actorSubscriber() methods respectively. But is it possible to create a Flow from actor?
Conceptually there doesn't seem to be a good reason not to, given that it implements both ActorPublisher and ActorSubscriber traits, but unfortunately, the Flow object doesn't have any method for doing this. In this excellent blog post it's done in an earlier version of Akka Streams, so the question is if it's possible also in the latest (2.4.9) version.
I'm part of the Akka team and would like to use this question to clarify a few things about the raw Reactive Streams interfaces. I hope you'll find this useful.
Most notably, we'll be posting multiple posts on the Akka team blog about building custom stages, including Flows, soon, so keep an eye on it.
Don't use ActorPublisher / ActorSubscriber
Please don't use ActorPublisher and ActorSubscriber. They're too low level and you might end up implementing them in such a way that's violating the Reactive Streams specification. They're a relict of the past and even then were only "power-user mode only". There really is no reason to use those classes nowadays. We never provided a way to build a flow because the complexity is simply explosive if it was exposed as "raw" Actor API for you to implement and get all the rules implemented correctly.
If you really really want to implement raw ReactiveStreams interfaces, then please do use the Specification's TCK to verify your implementation is correct. You will likely be caught off guard by some of the more complex corner cases a Flow (or in RS terminology a Processor has to handle).
Most operations are possible to build without going low-level
Many flows you should be able to simply build by building from a Flow[T] and adding the needed operations onto it, just as an example:
val newFlow: Flow[String, Int, NotUsed] = Flow[String].map(_.toInt)
Which is a reusable description of the Flow.
Since you're asking about power user mode, this is the most powerful operator on the DSL itself: statefulFlatMapConcat. The vast majority of operations operating on plain stream elements is expressable using it: Flow.statefulMapConcat[T](f: () ⇒ (Out) ⇒ Iterable[T]): Repr[T].
If you need timers you could zip with a Source.timer etc.
GraphStage is the simplest and safest API to build custom stages
Instead, building Sources/Flows/Sinks has its own powerful and safe API: the GraphStage. Please read the documentation about building custom GraphStages (they can be a Sink/Source/Flow or even any arbitrary shape). It handles all of the complex Reactive Streams rules for you, while giving you full freedom and type-safety while implementing your stages (which could be a Flow).
For example, taken from the docs, is an GraphStage implementation of the filter(T => Boolean) operator:
class Filter[A](p: A => Boolean) extends GraphStage[FlowShape[A, A]] {
val in = Inlet[A]("Filter.in")
val out = Outlet[A]("Filter.out")
val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
setHandler(in, new InHandler {
override def onPush(): Unit = {
val elem = grab(in)
if (p(elem)) push(out, elem)
else pull(in)
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
}
}
It also handles asynchronous channels and is fusable by default.
In addition to the docs, these blog posts explain in detail why this API is the holy grail of building custom stages of any shape:
Akka team blog: Mastering GraphStages (part 1, introduction) - a high level overview
... tomorrow we'll publish one about it's API as well...
Kunicki blog: Implementing a Custom Akka Streams Graph Stage - another example implementing sources (really applies 1:1 to building Flows)
Konrad's solution demonstrates how to create a custom stage that utilizes Actors, but in most cases I think that is a bit overkill.
Usually you have some Actor that is capable of responding to questions:
val actorRef : ActorRef = ???
type Input = ???
type Output = ???
val queryActor : Input => Future[Output] =
(actorRef ? _) andThen (_.mapTo[Output])
This can be easily utilized with basic Flow functionality which takes in the maximum number of concurrent requests:
val actorQueryFlow : Int => Flow[Input, Output, _] =
(parallelism) => Flow[Input].mapAsync[Output](parallelism)(queryActor)
Now actorQueryFlow can be integrated into any stream...
Here is a solution build by using a graph stage. The actor has to acknowledge all messages in order to have back-pressure. The actor is notified when the stream fails/completes and the stream fails when the actor terminates.
This can be useful if you don't want to use ask, e.g. when not every input message has a corresponding output message.
import akka.actor.{ActorRef, Status, Terminated}
import akka.stream._
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler, OutHandler}
object ActorRefBackpressureFlowStage {
case object StreamInit
case object StreamAck
case object StreamCompleted
case class StreamFailed(ex: Throwable)
case class StreamElementIn[A](element: A)
case class StreamElementOut[A](element: A)
}
/**
* Sends the elements of the stream to the given `ActorRef` that sends back back-pressure signal.
* First element is always `StreamInit`, then stream is waiting for acknowledgement message
* `ackMessage` from the given actor which means that it is ready to process
* elements. It also requires `ackMessage` message after each stream element
* to make backpressure work. Stream elements are wrapped inside `StreamElementIn(elem)` messages.
*
* The target actor can emit elements at any time by sending a `StreamElementOut(elem)` message, which will
* be emitted downstream when there is demand.
*
* If the target actor terminates the stage will fail with a WatchedActorTerminatedException.
* When the stream is completed successfully a `StreamCompleted` message
* will be sent to the destination actor.
* When the stream is completed with failure a `StreamFailed(ex)` message will be send to the destination actor.
*/
class ActorRefBackpressureFlowStage[In, Out](private val flowActor: ActorRef) extends GraphStage[FlowShape[In, Out]] {
import ActorRefBackpressureFlowStage._
val in: Inlet[In] = Inlet("ActorFlowIn")
val out: Outlet[Out] = Outlet("ActorFlowOut")
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
private lazy val self = getStageActor {
case (_, StreamAck) =>
if(firstPullReceived) {
if (!isClosed(in) && !hasBeenPulled(in)) {
pull(in)
}
} else {
pullOnFirstPullReceived = true
}
case (_, StreamElementOut(elemOut)) =>
val elem = elemOut.asInstanceOf[Out]
emit(out, elem)
case (_, Terminated(targetRef)) =>
failStage(new WatchedActorTerminatedException("ActorRefBackpressureFlowStage", targetRef))
case (actorRef, unexpected) =>
failStage(new IllegalStateException(s"Unexpected message: `$unexpected` received from actor `$actorRef`."))
}
var firstPullReceived: Boolean = false
var pullOnFirstPullReceived: Boolean = false
override def preStart(): Unit = {
//initialize stage actor and watch flow actor.
self.watch(flowActor)
tellFlowActor(StreamInit)
}
setHandler(in, new InHandler {
override def onPush(): Unit = {
val elementIn = grab(in)
tellFlowActor(StreamElementIn(elementIn))
}
override def onUpstreamFailure(ex: Throwable): Unit = {
tellFlowActor(StreamFailed(ex))
super.onUpstreamFailure(ex)
}
override def onUpstreamFinish(): Unit = {
tellFlowActor(StreamCompleted)
super.onUpstreamFinish()
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
if(!firstPullReceived) {
firstPullReceived = true
if(pullOnFirstPullReceived) {
if (!isClosed(in) && !hasBeenPulled(in)) {
pull(in)
}
}
}
}
override def onDownstreamFinish(): Unit = {
tellFlowActor(StreamCompleted)
super.onDownstreamFinish()
}
})
private def tellFlowActor(message: Any): Unit = {
flowActor.tell(message, self.ref)
}
}
override def shape: FlowShape[In, Out] = FlowShape(in, out)
}