How to retry failed Unmarshalling of a stream of akka-http requests? - scala

I know it's possible to restart an akka-stream on error with a supervision strategy on the ActorMaterialzer
val decider: Supervision.Decider = {
case _: ArithmeticException => Supervision.Resume
case _ => Supervision.Stop
}
implicit val materializer = ActorMaterializer(
ActorMaterializerSettings(system).withSupervisionStrategy(decider))
val source = Source(0 to 5).map(100 / _)
val result = source.runWith(Sink.fold(0)(_ + _))
// the element causing division by zero will be dropped
// result here will be a Future completed with Success(228)
source: http://doc.akka.io/docs/akka/2.4.2/scala/stream/stream-error.html
I have the following use case.
/***
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-http-experimental" % "2.4.2",
"com.typesafe.akka" %% "akka-http-spray-json-experimental" % "2.4.2"
)
*/
import akka.http.scaladsl.unmarshalling.Unmarshal
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport._
import spray.json._
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import Uri.Query
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import scala.util.{Success, Failure}
import scala.concurrent.Await
import scala.concurrent.duration.Duration
import scala.concurrent.Future
object SO extends DefaultJsonProtocol {
implicit val system = ActorSystem()
import system.dispatcher
implicit val materializer = ActorMaterializer()
val httpFlow = Http().cachedHostConnectionPoolHttps[HttpRequest]("example.org")
def search(query: Char) = {
val request = HttpRequest(uri = Uri("https://example.org").withQuery(Query("q" -> query.toString)))
(request, request)
}
case class Hello(name: String)
implicit val helloFormat = jsonFormat1(Hello)
val searches =
Source('a' to 'z').map(search).via(httpFlow).mapAsync(1){
case (Success(response), _) => Unmarshal(response).to[Hello]
case (Failure(e), _) => Future.failed(e)
}
def main(): Unit = {
Await.result(searches.runForeach(_ => println), Duration.Inf)
()
}
}
Sometime a query will fail to unmarshall. I want to use a retry strategy on that single query
https://example.org/?q=v without restarting the whole alphabet.

I think it will be hard (or impossible) to implement it with a supervsior strategy, mostly because you want to retry "n" times (according to the discussion in comments), and I don't think you can track the number of times the element was tried when using supervision.
I think there are two ways to solve this issue. Either handle the risky operation as a separate stream or create a graph, which will do error handling. I will propose two solutions.
Note also that Akka Streams distinguishes between errors and failures, so if you wont' handle your failures they will eventually collapse the flow (if no strategy is intriduced), so in the example below I convert them to Either, which represent either success or error.
Separate stream
What you can do is to treat each alphabet letter as a separate stream and handle failures for each letter separately with the retry strategy, and some delay.
// this comes after your helloFormat
// note that the method is somehow simpler because it's
// using implicit dispatcher and scheduler from outside scope,
// you may also want to pass it as implicit arguments
def retry[T](f: => Future[T], delay: FiniteDuration, c: Int): Future[T] =
f.recoverWith {
// you may want to only handle certain exceptions here...
case ex: Exception if c > 0 =>
println(s"failed - will retry ${c - 1} more times")
akka.pattern.after(delay, system.scheduler)(retry(f, delay, c - 1))
}
val singleElementFlow = httpFlow.mapAsync[Hello](1) {
case (Success(response), _) =>
val f = Unmarshal(response).to[Hello]
f.recoverWith {
case ex: Exception =>
// see https://github.com/akka/akka/issues/20192
response.entity.dataBytes.runWith(Sink.ignore).flatMap(_ => f)
}
case (Failure(e), _) => Future.failed(e)
}
// so the searches can either go ok or not, for each letter, we will retry up to 3 times
val searches =
Source('a' to 'z').map(search).mapAsync[Either[Throwable, Hello]](1) { elem =>
println(s"trying $elem")
retry(
Source.single(elem).via(singleElementFlow).runWith(Sink.head[Hello]),
1.seconds, 3
).map(ok => Right(ok)).recover { case ex => Left(ex) }
}
// end
Graph
This method will integrate failures into the graph, and will allow for retries. This example makes all requests run in parallel and prefer to retry those which failed, but if you don't want this behaviour and run them one by one this is something you can also do I believe.
// this comes after your helloFormat
// you may need to have your own class if you
// want to propagate failures for example, but we will use
// right value to keep track of how many times we have
// tried the request
type ParseResult = Either[(HttpRequest, Int), Hello]
def search(query: Char): (HttpRequest, (HttpRequest, Int)) = {
val request = HttpRequest(uri = Uri("https://example.org").withQuery(Query("q" -> query.toString)))
(request, (request, 0)) // let's use this opaque value to count how many times we tried to search
}
val g = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val searches = b.add(Flow[Char])
val tryParse =
Flow[(Try[HttpResponse], (HttpRequest, Int))].mapAsync[ParseResult](1) {
case (Success(response), (req, tries)) =>
println(s"trying parse response to $req for $tries")
Unmarshal(response).to[Hello].
map(h => Right(h)).
recoverWith {
case ex: Exception =>
// see https://github.com/akka/akka/issues/20192
response.entity.dataBytes.runWith(Sink.ignore).map { _ =>
Left((req, tries + 1))
}
}
case (Failure(e), _) => Future.failed(e)
}
val broadcast = b.add(Broadcast[ParseResult](2))
val nonErrors = b.add(Flow[ParseResult].collect {
case Right(x) => x
// you may also handle here Lefts which do exceeded retries count
})
val errors = Flow[ParseResult].collect {
case Left(x) if x._2 < 3 => (x._1, x)
}
val merge = b.add(MergePreferred[(HttpRequest, (HttpRequest, Int))](1, eagerComplete = true))
// #formatter:off
searches.map(search) ~> merge ~> httpFlow ~> tryParse ~> broadcast ~> nonErrors
merge.preferred <~ errors <~ broadcast
// #formatter:on
FlowShape(searches.in, nonErrors.out)
}
def main(args: Array[String]): Unit = {
val source = Source('a' to 'z')
val sink = Sink.seq[Hello]
source.via(g).toMat(sink)(Keep.right).run().onComplete {
case Success(seq) =>
println(seq)
case Failure(ex) =>
println(ex)
}
}
Basically what happens here is we run searches through httpFlow and then try to parse the response, we
then broadcast the result and split errors and non-errors, the non errors go to sink, and errors get sent
back to the loop. If the number of retries exceed the count, we ignore the element, but you can also do
something else with it.
Anyway I hope this gives you some idea.

For the streams solution above, any retries for the last element in the stream won't execute. That's because when the upstream completes after sending the last element the merge will also complete. After that the only output came come from the non-retry outlet but since the element goes to retry that gets completed too.
If you need all input elements to generate an output you'll need an extra mechanism to stop the upstream complete from reaching the process&retry graph. One possibility is to use a BidiFlow which monitors the input and output from the process&retry graph to ensure all the required outputs have been generated (for the observed inputs) before propagating the oncomplete. In the simple case that could just be counting input and output elements.

Related

What is the proper way of acknowledging SQS message in Akka streams

Let say I have a constant stream of SQS message from AWS. I have a flow which take messages and perform some side effects that is possibly unsafe.
I want to ACK all the messages, if the process failed for any reason I want to push the message to another queue. If the process succeded I want to delete it. I don't want to have not ACK messages.
Also I want to be able to dispatch the incoming messages into parallelized workers.
What is the proper way of doing such a thing ?
I devised that this is working but I wonder if it is a "good" solution:
val flow = Flow[String]
.map { value =>
if (value == "global")
throw new Exception("Unexpected failure parsing global")
println(value)
}
val result = Source(
Seq("coucou", "baba", "global", "obvious", "test", "lol", "supercool")
)
.mapAsync(2) { el =>
Source
.single(el)
.via(flow)
.recover {
case ex: Throwable =>
println(s"SEND message ${el} to another queue")
}
.runWith(Sink.foreach { _ =>
println(s"DELETE message ${el}")
})
}
.runWith(Sink.ignore)
What do you think?
In flow, instead of performing side effects, maybe you can simply wrap it in an Either so that you can handle left and right differently with nok and ok.
import akka.actor.ActorSystem
import akka.stream.SinkShape
import akka.stream.scaladsl.{Flow, GraphDSL, Partition, Sink, Source}
import scala.concurrent.ExecutionContext
object PartitionStream {
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("PartitionStream")
implicit val ec: ExecutionContext = system.dispatcher
val flow = Flow[String].map { value =>
if (value == "global")
Left(MyException("Unexpected failure parsing global"))
else Right(value)
}
val sink = Sink.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val partition = b.add(
Partition[Either[MyException, String]](2, el => if (el.isLeft) 0 else 1)
)
val collectNok = Flow[Either[MyException, String]].collect {
case Left(ex) => ex
}
val collectOk = Flow[Either[MyException, String]].collect {
case Right(value) => value
}
val nok = b.add(Sink.foreach[MyException](x => println(x.msg)))
val ok = b.add(Sink.foreach[String](println))
partition.out(0) ~> collectNok ~> nok
partition.out(1) ~> collectOk ~> ok
SinkShape(partition.in)
})
Source(
List("coucou", "baba", "global", "obvious", "test", "lol", "supercool")
).via(flow).runWith(sink)
}
case class MyException(msg: String)
}

Scala resolve multiple Futures and get a Map(String, AnyRef)

I am currently trying to resolve multiple futures at once but as some of them may fail, I don't want to get a failure on all if one of them fails, instead, end up with a Map(String, AnyRef) (meaning a Map with the future name and the response converted to what a need).
Currently I have the following:
val fsResp = channelList.map {
channelRef => channelRef.ask(ReportStatus).mapTo[EventMessage]
}
Future.sequence(fsResp).onComplete{
case Success(resp: Seq[EventMessage]) =>
resp.foreach { event => Supervisor.foreach(_ ! event) }
val channels = loadConfiguredComponents()
.collect {
case ("processor" | "external", components) => components.map {
case (name, config: Channel) =>
(name, serializeDetails(config, resp.find(_.channel == ChannelName(name))))
}
}.flatten.toMap
val event = EventMessage(...)
Supervisor.foreach(_ ! event)
case Failure(exception) => originalSender ! replayError(exception.getMessage)
}
But this fails if any of those fails. So How can I end up with a Map(channelRef.path.name, event() | exception) ?
Thanks!
You can use fallbackTo in order to avoid a Failure. In this example I change Future[T] to Future[Option[T]] in order to fallback to None, and then remove None elements.
import scala.concurrent.ExecutionContext.Implicits.global
def method(value:Int) = { Thread.sleep(2000); println(value); value }
println("start")
val eventualNone = Future.successful(None)
val futures = List(Future(method(1)), Future(method(2)), Future(method(3)), Future(throw new RuntimeException))
val withoutFailures = futures.map(_.map(Option.apply).fallbackTo(eventualNone))
Future.sequence(withoutFailures).map(_.flatten).onComplete {
case Success(values) => println(values)
case Failure(ex:Throwable) => println("FAIL")
}
Thread.sleep(5000)
output
start
1
3
2
List(1, 2, 3)
Can be changed to Either[Throwable, T] instead of Option[T] if you want to know what fails.
This code always be Success (regarding the Future result), so you need to inspect your values in order to know if all futures fail.
To capture successful/failed values from the list of Futures, you can first apply map/recover to each of them, then use Future.sequence to transform the result list into a Future of List[Either[Throwable,EventMessage]], as shown in the following trivialized example:
import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
case class EventMessage(id: Int, msg: String)
val fsResp = List(
Future{EventMessage(1, "M1")}, Future{throw new Throwable()}, Future{EventMessage(3, "M3")}
)
val f = Future.sequence(
fsResp.map( _.map{ resp =>
// Do stuff with `resp`, add item to `Map()`, etc ...
Right(resp)
}.
recover{ case e: Throwable =>
// Log `exception` info, etc ...
Left(e)
} )
)
Await.result(f, Duration.Inf)
// f: scala.concurrent.Future[List[Product with Serializable with scala.util.
// Either[Throwable,EventMessage]]] = Future(Success(List(
// Right(EventMessage(1,M1)), Left(java.lang.Throwable), Right(EventMessage(3,M3))
// )))

Akka Streams Error Handling. How to know which row failed?

I read this article on akka streams error handling
http://doc.akka.io/docs/akka/2.5.4/scala/stream/stream-error.html
and wrote this code.
val decider: Supervision.Decider = {
case _: Exception => Supervision.Restart
case _ => Supervision.Stop
}
implicit val actorSystem = ActorSystem()
implicit val actorMaterializer = ActorMaterializer(ActorMaterializerSettings(actorSystem).withSupervisionStrategy(decider))
val source = Source(1 to 10)
val flow = Flow[Int].map{x => if (x != 9) 2 * x else throw new Exception("9!")}
val sink : Sink[Int, Future[Done]] = Sink.foreach[Int](x => println(x))
val graph = RunnableGraph.fromGraph(GraphDSL.create(sink){implicit builder => s =>
import GraphDSL.Implicits._
source ~> flow ~> s.in
ClosedShape
})
val future = graph.run()
future.onComplete{ _ =>
actorSystem.terminate()
}
Await.result(actorSystem.whenTerminated, Duration.Inf)
This works very well .... except that I need to scan the output to see which row did not get processed. Is there a way for me to print/log the row which failed? [Without putting explicit try/catch blocks in each and every flow that I write?]
So for example If I was using actors (as opposed to streams) I could have written a life cycle event of an actor and I could have logged when an actor restarted along with the message which was being processed at the time of restart.
but here I am not using actors explicitly (although they are used internally). Are there life cycle events for a Flow / Source / Sink?
Just a small modification to your code:
val decider: Supervision.Decider = {
case e: Exception =>
println("Exception handled, recovering stream:" + e.getMessage)
Supervision.Restart
case _ => Supervision.Stop
}
If you pass meaningful messages to your exceptions in the stream, the line for example, you can print them in the supervision decider.
I used println to give a quick and short answer, but strongly recommend to use
some logging libraries such as scala-logging

Akka streams: Reading multiple files

I have a list of files. I want:
To read from all of them as a single Source.
Files should be read sequentially, in-order. (no round-robin)
At no point should any file be required to be entirely in memory.
An error reading from a file should collapse the stream.
It felt like this should work: (Scala, akka-streams v2.4.7)
val sources = Seq("file1", "file2").map(new File(_)).map(f => FileIO.fromPath(f.toPath)
.via(Framing.delimiter(ByteString(System.lineSeparator), 10000, allowTruncation = true))
.map(bs => bs.utf8String)
)
val source = sources.reduce( (a, b) => Source.combine(a, b)(MergePreferred(_)) )
source.map(_ => 1).runWith(Sink.reduce[Int](_ + _)) // counting lines
But that results in a compile error since FileIO has a materialized value associated with it, and Source.combine doesn't support that.
Mapping the materialized value away makes me wonder how file-read errors get handled, but does compile:
val sources = Seq("file1", "file2").map(new File(_)).map(f => FileIO.fromPath(f.toPath)
.via(Framing.delimiter(ByteString(System.lineSeparator), 10000, allowTruncation = true))
.map(bs => bs.utf8String)
.mapMaterializedValue(f => NotUsed.getInstance())
)
val source = sources.reduce( (a, b) => Source.combine(a, b)(MergePreferred(_)) )
source.map(_ => 1).runWith(Sink.reduce[Int](_ + _)) // counting lines
But throws an IllegalArgumentException at runtime:
java.lang.IllegalArgumentException: requirement failed: The inlets [] and outlets [MergePreferred.out] must correspond to the inlets [MergePreferred.preferred] and outlets [MergePreferred.out]
The code below is not as terse as it could be, in order to clearly modularize the different concerns.
// Given a stream of bytestrings delimited by the system line separator we can get lines represented as Strings
val lines = Framing.delimiter(ByteString(System.lineSeparator), 10000, allowTruncation = true).map(bs => bs.utf8String)
// given as stream of Paths we read those files and count the number of lines
val lineCounter = Flow[Path].flatMapConcat(path => FileIO.fromPath(path).via(lines)).fold(0l)((count, line) => count + 1).toMat(Sink.head)(Keep.right)
// Here's our test data source (replace paths with real paths)
val testFiles = Source(List("somePathToFile1", "somePathToFile2").map(new File(_).toPath))
// Runs the line counter over the test files, returns a Future, which contains the number of lines, which we then print out to the console when it completes
testFiles runWith lineCounter foreach println
Update Oh, I didn't see the accepted answer because I didn't refresh the page >_<. I'll leave this here anyway since I've also added some notes about error handling.
I believe the following program does what you want:
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.{ActorMaterializer, IOResult}
import akka.stream.scaladsl.{FileIO, Flow, Framing, Keep, Sink, Source}
import akka.util.ByteString
import scala.concurrent.{Await, Future}
import scala.util.{Failure, Success}
import scala.util.control.NonFatal
import java.nio.file.Paths
import scala.concurrent.duration._
object TestMain extends App {
implicit val actorSystem = ActorSystem("test")
implicit val materializer = ActorMaterializer()
implicit def ec = actorSystem.dispatcher
val sources = Vector("build.sbt", ".gitignore")
.map(Paths.get(_))
.map(p =>
FileIO.fromPath(p)
.viaMat(Framing.delimiter(ByteString(System.lineSeparator()), Int.MaxValue, allowTruncation = true))(Keep.left)
.mapMaterializedValue { f =>
f.onComplete {
case Success(r) if r.wasSuccessful => println(s"Read ${r.count} bytes from $p")
case Success(r) => println(s"Something went wrong when reading $p: ${r.getError}")
case Failure(NonFatal(e)) => println(s"Something went wrong when reading $p: $e")
}
NotUsed
}
)
val finalSource = Source(sources).flatMapConcat(identity)
val result = finalSource.map(_ => 1).runWith(Sink.reduce[Int](_ + _))
result.onComplete {
case Success(n) => println(s"Read $n lines total")
case Failure(e) => println(s"Reading failed: $e")
}
Await.ready(result, 10.seconds)
actorSystem.terminate()
}
The key here is the flatMapConcat() method: it transforms each element of a stream into a source and returns a stream of elements yielded by these sources if they are run sequentially.
As for handling errors, you can either add a handler to the future in the mapMaterializedValue argument, or you can handle the final error of the running stream by putting a handler on the Sink.foreach materialized future value. I did both in the example above, and if you test it, say, on a nonexisting file, you'll see that the same error message will be printed twice. Unfortunately, flatMapConcat() does not collect materialized values, and frankly I can't see the way it could do it sanely, therefore you have to handle them separately, if necessary.
I do have one answer out of the gate - don't use akka.FileIO. This appears to work fine, for example:
val sources = Seq("sample.txt", "sample2.txt").map(io.Source.fromFile(_).getLines()).reduce(_ ++ _)
val source = Source.fromIterator[String](() => sources)
val lineCount = source.map(_ => 1).runWith(Sink.reduce[Int](_ + _))
I'd still like to know whether there's a better solution.

Scala Akka Stream: How to Pass Through a Seq

I'm trying to wrap some blocking calls in Future.The return type is Seq[User] where User is a case class. The following just wouldn't compile with complaints of various overloaded versions being present. Any suggestions? I tried almost all the variations is Source.apply without any luck.
// All I want is Seq[User] => Future[Seq[User]]
def findByFirstName(firstName: String) = {
val users: Seq[User] = userRepository.findByFirstName(firstName)
val sink = Sink.fold[User, User](null)((_, elem) => elem)
val src = Source(users) // doesn't compile
src.runWith(sink)
}
First of all, I assume that you are using version 1.0 of akka-http-experimental since the API may changed from previous release.
The reason why your code does not compile is that the akka.stream.scaladsl.Source$.apply() requires
scala.collection.immutable.Seq instead of scala.collection.mutable.Seq.
Therefore you have to convert from mutable sequence to immutable sequence using to[T] method.
Document: akka.stream.scaladsl.Source
Additionally, as you see the document, Source$.apply() accepts ()=>Iterator[T] so you can also pass ()=>users.iterator as argument.
Since Sink.fold(...) returns the last evaluated expression, you can give an empty Seq() as the first argument, iterate over the users with appending the element to the sequence, and finally get the result.
However, there might be a better solution that can create a Sink which puts each evaluated expression into Seq, but I could not find it.
The following code works.
import akka.actor._
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Source,Sink}
import scala.concurrent.ExecutionContext.Implicits.global
case class User(name:String)
object Main extends App{
implicit val system = ActorSystem("MyActorSystem")
implicit val materializer = ActorMaterializer()
val users = Seq(User("alice"),User("bob"),User("charlie"))
val sink = Sink.fold[Seq[User], User](Seq())(
(seq, elem) =>
{println(s"elem => ${elem} \t| seq => ${seq}");seq:+elem})
val src = Source(users.to[scala.collection.immutable.Seq])
// val src = Source(()=>users.iterator) // this also works
val fut = src.runWith(sink) // Future[Seq[User]]
fut.onSuccess({
case x=>{
println(s"result => ${x}")
}
})
}
The output of the code above is
elem => User(alice) | seq => List()
elem => User(bob) | seq => List(User(alice))
elem => User(charlie) | seq => List(User(alice), User(bob))
result => List(User(alice), User(bob), User(charlie))
If you need just Future[Seq[Users]] dont use akka streams but
futures
import scala.concurrent._
import ExecutionContext.Implicits.global
val session = socialNetwork.createSessionFor("user", credentials)
val f: Future[List[Friend]] = Future {
session.getFriends()
}