How to deal with future inside a customized akka Sink? - scala

I am trying to implement a customized Akka Sink, but I could not find a way to handle future inside it.
class EventSink(...) {
val in: Inlet[EventEnvelope2] = Inlet("EventSink")
override val shape: SinkShape[EventEnvelope2] = SinkShape(in)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = {
new GraphStageLogic(shape) {
// This requests one element at the Sink startup.
override def preStart(): Unit = pull(in)
setHandler(in, new InHandler {
override def onPush(): Unit = {
val future = handle(grab(in))
Await.ready(future, Duration.Inf)
/*
future.onComplete {
case Success(_) =>
logger.info("pulling next events")
pull(in)
case Failure(failure) =>
logger.error(failure.getMessage, failure)
throw failure
}*/
pull(in)
}
})
}
}
private def handle(envelope: EventEnvelope2): Future[Unit] = {
val EventEnvelope2(query.Sequence(offset), _/*persistenceId*/, _/*sequenceNr*/, event) = envelope
...
db.run(statements.transactionally)
}
}
I have to go with blocking future at the moment, which does not look good. The non-blocking one I commented out only works for the first event. Could anyone please help?
Updated Thanks #ViktorKlang. It seems to be working now.
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
{
new GraphStageLogic(shape) {
val callback = getAsyncCallback[Try[Unit]] {
case Success(_) =>
//completeStage()
pull(in)
case Failure(error) =>
failStage(error)
}
// This requests one element at the Sink startup.
override def preStart(): Unit = {
pull(in)
}
setHandler(in, new InHandler {
override def onPush(): Unit = {
val future = handle(grab(in))
future.onComplete { result =>
callback.invoke(result)
}
}
})
}
}
I am trying to implement a Rational DB event sink connnecting to ReadJournal.eventsByTag. So this is a continuous stream, which will never end unless there is an error - This is what I want. Is my approach correct?
Two more questions:
Will the GraphStage never end unless I manually invoke completeStage or failStage?
Am I right or normal to declare callback outside preStart method? and Am I right to invoke pull(in) in preStart in this case?
Thanks,
Cheng

Avoid Custom Stages
In general, you should try to exhaust all possibilities with the given methods of the library's Source, Flow, and Sink. Custom stages are almost never necessary and make your code difficult to maintain.
Writing Your "Custom" Stage Using Standard Methods
Based on the details of your question's example code I don't see any reason why you would use a custom Sink to begin with.
Given your handle method, you could slightly modify it to do the logging that you specified in the question:
val loggedHandle : (EventEnvelope2) => Future[Unit] =
handle(_) transform {
case Success(_) => {
logger.info("pulling next events")
Success(Unit)
}
case Failure(failure) => {
logger.error(failure.getMessage, failure)
Failure(failure)
}
}
Then just use Sink.foreachParallel to handle the envelopes:
val createEventEnvelope2Sink : (Int) => Sink[EventEnvelope2, Future[Done]] =
(parallelism) =>
Sink[EventEnvelope2].foreachParallel(parallelism)(handle _)
Now, even if you want each EventEnvelope2 to be sent to the db in order you can just use 1 for parallelism:
val inOrderDBInsertSink : Sink[EventEnvelope2, Future[Done]] =
createEventEnvelope2Sink(1)
Also, if the database throws an exception you can still get a hold of it when the foreachParallel completes:
val someEnvelopeSource : Source[EventEnvelope2, _] = ???
someEnvelopeSource
.to(createEventEnvelope2Sink(1))
.run()
.andThen {
case Failure(throwable) => { /*deal with db exception*/ }
case Success(_) => { /*all inserts succeeded*/ }
}

Related

Akka Streams: handling Future inside GraphStage Source

I am trying to build an Akka Stream Source which receives data by making Future API calls (The nature of API is scrolling, which incrementally fetches results). To build such Source, I am using GraphStage.
I have modified the NumberSource example which simply pushes an Int at a time. The only change I did was to replace that Int with getvalue(): Future[Int] (to simulate the API call):
class NumbersSource extends GraphStage[SourceShape[Int]] {
val out: Outlet[Int] = Outlet("NumbersSource")
override val shape: SourceShape[Int] = SourceShape(out)
// simple example of future API call
private def getvalue(): Future[Int] = Future.successful(Random.nextInt())
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
setHandler(out, new OutHandler {
override def onPull(): Unit = {
// Future API call
getvalue().onComplete{
case Success(value) =>
println("Pushing value received..") // this is currently being printed just once
push(out, counter)
case Failure(exception) =>
}
}
}
})
}
}
// Using the Source and Running the stream
val sourceGraph: Graph[SourceShape[Int], NotUsed] = new NumbersSource
val mySource: Source[Int, NotUsed] = Source.fromGraph(sourceGraph)
val done: Future[Done] = mySource.runForeach{
num => println(s"Received: $num") // This is currently not printed
}
done.onComplete(_ => system.terminate())
The above code doesn't work. The println statement inside setHandler is executed just once and nothing is pushed downstream.
How should such Future calls be handled ? Thanks.
UPDATE
I tried to use getAsyncCallback by making changes as follow:
class NumbersSource(futureNum: Future[Int]) extends GraphStage[SourceShape[Int]] {
val out: Outlet[Int] = Outlet("NumbersSource")
override val shape: SourceShape[Int] = SourceShape(out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
override def preStart(): Unit = {
val callback = getAsyncCallback[Int] { (_) =>
completeStage()
}
futureNum.foreach(callback.invoke)
}
setHandler(out, new OutHandler {
override def onPull(): Unit = {
val value: Int = ??? // How to get this value ??
push(out, value)
}
})
}
}
// Using the Source and Running the Stream
def random(): Future[Int] = Future.successful(Random.nextInt())
val sourceGraph: Graph[SourceShape[Int], NotUsed] = new NumbersSource(random())
val mySource: Source[Int, NotUsed] = Source.fromGraph(sourceGraph)
val done: Future[Done] = mySource.runForeach{
num => println(s"Received: $num") // This is currently not printed
}
done.onComplete(_ => system.terminate())
But, now I am stuck at how to grab the value computed from Future. In case of a GraphStage, Flow, I could use:
val value = grab(in) // where in is Inlet of a Flow
But, what I have is a GraphStage, Source, so I have no idea how to grab the Int value of computed Future above.
I'm not sure if I understand correctly, but if you are trying to implement an infinite source out of elements computed in Futures then there is really no need to do it with your own GraphStage. You can do it simply as below:
Source.repeat(())
.mapAsync(parallelism) { _ => Future.successful(Random.nextInt()) }
The Source.repeat(()) is simply an infinite source of some arbitrary values (of type Unit in this case, but you can change () to whatever you want since it's ignored here). mapAsync then is used to integrate the asynchronous computations into the flow.
I would join to the other answer to try to avoid creating your own graphstage. After some experimentation this is what seems to work for me:
type Data = Int
trait DbResponse {
// this is just a callback for a compact solution
def nextPage: Option[() => Future[DbResponse]]
def data: List[Data]
}
def createSource(dbCall: DbResponse): Source[Data, NotUsed] = {
val thisPageSource = Source.apply(dbCall.data)
val nextPageSource = dbCall.nextPage match {
case Some(dbCallBack) => Source.lazySource(() => Source.future(dbCallBack()).flatMapConcat(createSource))
case None => Source.empty
}
thisPageSource.concat(nextPageSource)
}
val dataSource: Source[Data, NotUsed] = Source
.future(???: Future[DbResponse]) // the first db call
.flatMapConcat(createSource)
I tried it out and it works almost perfectly, I couldn't find out why, but the second page is instantaneously requested, but the rest will work as expected (with backpressure and what not).

MVar tryPut returns true and isEmpty also returns true

I wrote simple callback(handler) function which i pass to async api and i want to wait for result:
object Handlers {
val logger: Logger = Logger("Handlers")
implicit val cs: ContextShift[IO] =
IO.contextShift(ExecutionContext.Implicits.global)
class DefaultHandler[A] {
val response: IO[MVar[IO, A]] = MVar.empty[IO, A]
def onResult(obj: Any): Unit = {
obj match {
case obj: A =>
println(response.flatMap(_.tryPut(obj)).unsafeRunSync())
println(response.flatMap(_.isEmpty).unsafeRunSync())
case _ => logger.error("Wrong expected type")
}
}
def getResponse: A = {
response.flatMap(_.take).unsafeRunSync()
}
}
But for some reason both tryPut and isEmpty(when i'd manually call onResult method) returns true, therefore when i calling getResponse it sleeps forever.
This is the my test:
class HandlersTest extends FunSuite {
test("DefaultHandler.test") {
val handler = new DefaultHandler[Int]
handler.onResult(3)
val response = handler.getResponse
assert(response != 0)
}
}
Can somebody explain why tryPut returns true, but nothing puts. And what is the right way to use Mvar/channels in scala?
IO[X] means that you have the recipe to create some X. So on your example, yuo are putting in one MVar and then asking in another.
Here is how I would do it.
object Handlers {
trait DefaultHandler[A] {
def onResult(obj: Any): IO[Unit]
def getResponse: IO[A]
}
object DefaultHandler {
def apply[A : ClassTag]: IO[DefaultHandler[A]] =
MVar.empty[IO, A].map { response =>
new DefaultHandler[A] {
override def onResult(obj: Any): IO[Unit] = obj match {
case obj: A =>
for {
r1 <- response.tryPut(obj)
_ <- IO(println(r1))
r2 <- response.isEmpty
_ <- IO(println(r2))
} yield ()
case _ =>
IO(logger.error("Wrong expected type"))
}
override def getResponse: IO[A] =
response.take
}
}
}
}
The "unsafe" is sort of a hint, but every time you call unsafeRunSync, you should basically think of it as an entire new universe. Before you make the call, you can only describe instructions for what will happen, you can't actually change anything. During the call is when all the changes occur. Once the call completes, that universe is destroyed, and you can read the result but no longer change anything. What happens in one unsafeRunSync universe doesn't affect another.
You need to call it exactly once in your test code. That means your test code needs to look something like:
val test = for {
handler <- TestHandler.DefaultHandler[Int]
_ <- handler.onResult(3)
response <- handler.getResponse
} yield response
assert test.unsafeRunSync() == 3
Note this doesn't really buy you much over just using the MVar directly. I think you're trying to mix side effects inside IO and outside it, but that doesn't work. All the side effects need to be inside.

Akka Stream - Pausable GraphStage (Akka 2.5.7)

I would like to write a GraphStage which can be paused/unpaused by sending a message from another actor.
The code snipped below shows a simple GraphStage which generates random numbers. When the stage gets materialized the GraphStageLogic sends a message (within preStart()) containing the StageActor to a supervisor. The supervisor keeps the stage's ActorRef and can therefore be used to control the stage.
object RandomNumberSource {
case object Pause
case object UnPause
}
class RandomNumberSource(supervisor: ActorRef) extends GraphStage[SourceShape[Int]] {
val out: Outlet[Int] = Outlet("rnd.out")
override val shape: SourceShape[Int] = SourceShape(out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = {
new RandomNumberSourceLogic(shape)
}
private class RandomNumberSourceLogic(shape: Shape) extends GraphStageLogic(shape) with StageLogging {
lazy val self: StageActor = getStageActor(onMessage)
val numberGenerator: Random = Random
var isPaused: Boolean = true
override def preStart(): Unit = {
supervisor ! AssignStageActor(self.ref)
}
setHandler(out, new OutHandler {
override def onPull(): Unit = {
if (!isPaused) {
push(out, numberGenerator.nextInt())
Thread.sleep(1000)
}
}
})
private def onMessage(x: (ActorRef, Any)): Unit =
{
x._2 match {
case Pause =>
isPaused = true
log.info("Stream paused")
case UnPause =>
isPaused = false
getHandler(out).onPull()
log.info("Stream unpaused!")
case _ =>
}
}
}
}
This is a very simple implementation of the supervisor actor:
object Supervisor {
case class AssignStageActor(ref: ActorRef)
}
class Supervisor extends Actor with ActorLogging {
var stageActor: Option[ActorRef] = None
override def receive: Receive = {
case AssignStageActor(ref) =>
log.info("Stage assigned!")
stageActor = Some(ref)
ref ! Done
case Pause =>
log.info("Pause stream!")
stageActor match {
case Some(ref) => ref ! Pause
case _ =>
}
case UnPause =>
log.info("UnPause stream!")
stageActor match {
case Some(ref) => ref ! UnPause
case _ =>
}
}
}
I'm using the following application to run the stream:
object Application extends App {
implicit val system = ActorSystem("my-actor-system")
implicit val materializer = ActorMaterializer()
val supervisor = system.actorOf(Props[Supervisor], "supervisor")
val sourceGraph: Graph[SourceShape[Int], NotUsed] = new RandomNumberSource(supervisor)
val randomNumberSource: Source[Int, NotUsed] = Source.fromGraph(sourceGraph)
randomNumberSource.take(100).runForeach(println)
println("Start stream by pressing any key")
StdIn.readLine()
supervisor ! UnPause
StdIn.readLine()
supervisor ! Pause
StdIn.readLine()
println("=== Terminating ===")
system.terminate()
}
When the application starts the stage ia in 'paused' state and does not produce any number. When i press a key my stage starts to emit numbers. But my problem is that all messages sent to the stage after it has been started are ignored. I can not pause the stage.
I'm interested in changing the behavior of a stage based on a message received from an actor, but all examples i found pass an actor's message into the stream.
Does somebody has a guess why my code does not work or has an idea how to build such a GraphStage?
Thank you very much!
The Akka Stream Contrib project has a Valve stage that materializes a value that can pause and resume a flow. From the Scaladoc for this class:
Materializes into a Future of ValveSwitch which provides a the method flip that stops or restarts the flow of elements passing through the stage. As long as the valve is closed it will backpressure.
For example:
val (switchFut, seqSink) = Source(1 to 10)
.viaMat(new Valve(SwitchMode.Close))(Keep.right)
.toMat(Sink.seq)(Keep.both)
.run()
switchFut is a Future[ValveSwitch], and since the switch is closed initially, the valve backpressures and nothing is emitted downstream. To open the valve:
switchFut.onComplete {
case Success(switch) =>
switch.flip(SwitchMode.Open) // Future[Boolean]
case _ =>
log.error("the valve failed")
}
More examples are in ValveSpec.

Composing BodyParser in Play 2.5

Given a function with this signature:
def parser[A](otherParser: BodyParser[A]): BodyParser[A]
How can I write the function in such a way that the request body is examined and verified before it is passed to otherParser?
For simplicity let's say that I want to verify that a header ("Some-Header", perhaps) has a value that matches the body exactly. So if I have this action:
def post(): Action(parser(parse.tolerantText)) { request =>
Ok(request.body)
}
When I make a request like curl -H "Some-Header: hello" -d "hello" http://localhost:9000/post it should return "hello" in the response body with a status of 200. If my request is curl -H "Some-Header: hello" -d "hi" http://localhost:9000/post it should return a 400 with no body.
Here's what I've tried.
This one does not compile because otherParser(request).through(flow) expects flow to output a ByteString. The idea here was that the flow could notify the accumulator whether or not to continue processing via the Either output. I'm not sure how to let the accumulator know the status of the previous step.
def parser[A](otherParser: BodyParser[A]): BodyParser[A] = BodyParser { request =>
val flow: Flow[ByteString, Either[Result, ByteString], NotUsed] = Flow[ByteString].map { bytes =>
if (request.headers.get("Some-Header").contains(bytes.utf8String)) {
Right(bytes)
} else {
Left(BadRequest)
}
}
val acc: Accumulator[ByteString, Either[Result, A]] = otherParser(request)
// This fails to compile because flow needs to output a ByteString
acc.through(flow)
}
I also attempted to use filter. This one does compile and the response body that gets written is correct. However it always returns a 200 Ok response status.
def parser[A](otherParser: BodyParser[A]): BodyParser[A] = BodyParser { request =>
val flow: Flow[ByteString, ByteString, akka.NotUsed] = Flow[ByteString].filter { bytes =>
request.headers.get("Some-Header").contains(bytes.utf8String)
}
val acc: Accumulator[ByteString, Either[Result, A]] = otherParser(request)
acc.through(flow)
}
I came up with a solution using a GraphStageWithMaterializedValue. This concept was borrowed from Play's maxLength body parser. The key difference between my first attempt in my question (that doesn't compile) is that instead of attempting to mutate the stream I should use the materialized value to convey information about the state of processing. While I had created a Flow[ByteString, Either[Result, ByteString], NotUsed] it turns out what I needed was a Flow[ByteString, ByteString, Future[Boolean]].
So with that, my parser function ends up looking like this:
def parser[A](otherParser: BodyParser[A]): BodyParser[A] = BodyParser { request =>
val flow: Flow[ByteString, ByteString, Future[Boolean]] = Flow.fromGraph(new BodyValidator(request.headers.get("Some-Header")))
val parserSink: Sink[ByteString, Future[Either[Result, A]]] = otherParser.apply(request).toSink
Accumulator(flow.toMat(parserSink) { (statusFuture: Future[Boolean], resultFuture: Future[Either[Result, A]]) =>
statusFuture.flatMap { success =>
if (success) {
resultFuture.map {
case Left(result) => Left(result)
case Right(a) => Right(a)
}
} else {
Future.successful(Left(BadRequest))
}
}
})
}
The key line is this one:
val flow: Flow[ByteString, ByteString, Future[Boolean]] = Flow.fromGraph(new BodyValidator(request.headers.get("Some-Header")))
The rest kind of falls into place once you are able to create this flow. Unfortunately BodyValidator is pretty verbose and feels somewhat boiler-platey. In any case, it's mostly pretty easy to read. GraphStageWithMaterializedValue expects you to implement def shape: S (S is FlowShape[ByteString, ByteString] here) to specify the input type and output type of this graph. It also expects you to imlpement def createLogicAndMaterializedValue(inheritedAttributes: Attributes): (GraphStageLogic, M) (M is a Future[Boolean] here) to define what the graph should actually do. Here's the full code of BodyValidator (I'll explain in more detail below):
class BodyValidator(expected: Option[String]) extends GraphStageWithMaterializedValue[FlowShape[ByteString, ByteString], Future[Boolean]] {
val in = Inlet[ByteString]("BodyValidator.in")
val out = Outlet[ByteString]("BodyValidator.out")
override def shape: FlowShape[ByteString, ByteString] = FlowShape.of(in, out)
override def createLogicAndMaterializedValue(inheritedAttributes: Attributes): (GraphStageLogic, Future[Boolean]) = {
val status = Promise[Boolean]()
val bodyBuffer = new ByteStringBuilder()
val logic = new GraphStageLogic(shape) {
setHandler(out, new OutHandler {
override def onPull(): Unit = pull(in)
})
setHandler(in, new InHandler {
def onPush(): Unit = {
val chunk = grab(in)
bodyBuffer.append(chunk)
push(out, chunk)
}
override def onUpstreamFinish(): Unit = {
val fullBody = bodyBuffer.result()
status.success(expected.map(ByteString(_)).contains(fullBody))
completeStage()
}
override def onUpstreamFailure(e: Throwable): Unit = {
status.failure(e)
failStage(e)
}
})
}
(logic, status.future)
}
}
You first want to create an Inlet and Outlet to set up the inputs and outputs for your graph
val in = Inlet[ByteString]("BodyValidator.in")
val out = Outlet[ByteString]("BodyValidator.out")
Then you use these to define shape.
def shape: FlowShape[ByteString, ByteString] = FlowShape.of(in, out)
Inside createLogicAndMaterializedValue you need to initialize the value you intend to materialze. Here I've used a promise that can be resolved when I have the full data from the stream. I also create a ByteStringBuilder to track the data between iterations.
val status = Promise[Boolean]()
val bodyBuffer = new ByteStringBuilder()
Then I create a GraphStageLogic to actually set up what this graph does at each point of processing. Two handler are being set. One is an InHandler for dealing with data as it comes from the upstream source. The other is an OutHandler for dealing with data to send downstream. There's nothing really interesting in the OutHandler so I'll ignore it here besides to say that it is necessary boiler plate in order to avoid an IllegalStateException. Three methods are overridden in the InHandler: onPush, onUpstreamFinish, and onUpstreamFailure. onPush is called when new data is ready from upstream. In this method I simply grab the next chunk of data, write it to bodyBuffer and push the data downstream.
def onPush(): Unit = {
val chunk = grab(in)
bodyBuffer.append(chunk)
push(out, chunk)
}
onUpstreamFinish is called when the upstream finishes (surprise). This is where the business logic of comparing the body with the header happens.
override def onUpstreamFinish(): Unit = {
val fullBody = bodyBuffer.result()
status.success(expected.map(ByteString(_)).contains(fullBody))
completeStage()
}
onUpstreamFailure is implemented so that when something goes wrong, I can mark the materialized future as failed as well.
override def onUpstreamFailure(e: Throwable): Unit = {
status.failure(e)
failStage(e)
}
Then I just return the GraphStageLogic I've created and status.future as a tuple.

Closing an Akka stream from inside a GraphStage (Akka 2.4.2)

In Akka Stream 2.4.2, PushStage has been deprecated. For Streams 2.0.3 I was using the solution from this answer:
How does one close an Akka stream?
which was:
import akka.stream.stage._
val closeStage = new PushStage[Tpe, Tpe] {
override def onPush(elem: Tpe, ctx: Context[Tpe]) = elem match {
case elem if shouldCloseStream ⇒
// println("stream closed")
ctx.finish()
case elem ⇒
ctx.push(elem)
}
}
How would I close a stream in 2.4.2 immediately, from inside a GraphStage / onPush() ?
Use something like this:
val closeStage = new GraphStage[FlowShape[Tpe, Tpe]] {
val in = Inlet[Tpe]("closeStage.in")
val out = Outlet[Tpe]("closeStage.out")
override val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes) = new GraphStageLogic(shape) {
setHandler(in, new InHandler {
override def onPush() = grab(in) match {
case elem if shouldCloseStream ⇒
// println("stream closed")
completeStage()
case msg ⇒
push(out, msg)
}
})
setHandler(out, new OutHandler {
override def onPull() = pull(in)
})
}
}
It is more verbose but one the one side one can define this logic in a reusable way and on the other side one no longer has to worry about differences between the stream elements because the GraphStage can be handled in the same way as a flow would be handled:
val flow: Flow[Tpe] = ???
val newFlow = flow.via(closeStage)
Posting for other people's reference. sschaef's answer is correct procedurally, but the connections was kept open for a minute and eventually would time out and throw a "no activity" exception, closing the connection.
In reading the docs further, I noticed that the connection was closed when all upstreams flows completed. In my case, I had more than one upstream.
For my particular use case, the fix was to add eagerComplete=true to close stream as soon as any (rather than all) upstream completes. Something like:
... = builder.add(Merge[MyObj](3,eagerComplete = true))
Hope this helps someone.