NNTP client using akka streams in scala - scala

I'm trying to implement an NNTP client that streams a list of commands to the server and parsing the results back. I'm facing several problems :
the NNTP protocol doesn't have an "unique" delimiter that could be use to frame results. Some commands return multi-line responses. How to handle that with streams ?
how to "map" the command issued with the server response and wait the end of server response before sending the next command ? (Throttling is not relevant here)
how to stop the stream processing on disconnection ? (Actually, the program never returns)
Here is my current implementation :
import akka.stream._
import akka.stream.scaladsl._
import akka.{ NotUsed, Done }
import akka.actor.ActorSystem
import akka.util.ByteString
import scala.concurrent._
import scala.concurrent.duration._
import java.nio.file.Paths
import scala.io.StdIn
import scala.concurrent.ExecutionContext.Implicits.global
import scala.util.Success
import scala.util.Failure
object AutomatedClient extends App {
implicit val system = ActorSystem("NewsClientTest")
implicit val materializer = ActorMaterializer()
// MODEL //
final case class Command(query: String)
final case class CommandResult(
resultCode: Int,
resultStatus: String,
resultList: Option[List[String]])
final case class ParseException(message: String) extends RuntimeException
// COMMAND HANDLING FUN //
// out ->
val sendCommand: Command => ByteString = c => ByteString(c.query + "\r\n")
// in <-
val parseCommandResultStatus: String => (Int, String) = s =>
(s.take(3).toInt, s.drop(3).trim)
val parseCommandResultList: List[String] => List[String] = l =>
l.foldLeft(List().asInstanceOf[List[String]]){
case (acc, ".") => acc
case (acc, e) => e.trim :: acc
}.reverse
val parseCommandResult: ByteString => Future[CommandResult] = b => Future {
val resultLines = b.decodeString("UTF-8").split("\r\n")
resultLines.length match {
case 0 => throw new ParseException("empty result")
case 1 =>
val (code, text) = parseCommandResultStatus(resultLines.head)
new CommandResult(code, text, None)
case _ =>
val (code, text) = parseCommandResultStatus(resultLines.head)
new CommandResult(code, text, Some(parseCommandResultList(resultLines.tail.toList)))
}
}
// STREAMS //
// Flows
val outgoing: Flow[Command, ByteString, NotUsed] = Flow fromFunction sendCommand
val incoming: Flow[ByteString, Future[CommandResult], NotUsed] = Flow fromFunction parseCommandResult
val protocol = BidiFlow.fromFlows(incoming, outgoing)
// Sink
val print: Sink[Future[CommandResult], _] = Sink.foreach(f =>
f.onComplete {
case Success(r) => println(r)
case Failure(r) => println("error decoding command result")
})
// Source
val testSource: Source[Command, NotUsed] = Source(List(
new Command("help"),
new Command("list"),
new Command("quit")
))
val (host, port) = ("localhost", 1119)
Tcp()
.outgoingConnection(host, port)
.join(protocol)
.runWith(testSource, print)
}
And here is the result output :
CommandResult(200,news.localhost NNRP Service Ready - newsmaster#localhost (posting ok),None)
CommandResult(100,Legal Commands,Some(List(article [<messageid>|number], authinfo type value, body [<messageid>|number], date, group newsgroup, head [<messageid>|number], help, last, list [active wildmat|active.times|counts wildmat], list [overview.fmt|newsgroups wildmat], listgroup newsgroup, mode reader, next, post, stat [<messageid>|number], xhdr field [range], xover [range], xpat field range pattern, xfeature useragent <client identifier>, xfeature compress gzip [terminator], xzver [range], xzhdr field [range], quit, 480 Authentication Required*, 205 Goodbye)))
We can see that the second CommandResult contains the result of "list" command and "quit" command.

Related

What is the proper way of acknowledging SQS message in Akka streams

Let say I have a constant stream of SQS message from AWS. I have a flow which take messages and perform some side effects that is possibly unsafe.
I want to ACK all the messages, if the process failed for any reason I want to push the message to another queue. If the process succeded I want to delete it. I don't want to have not ACK messages.
Also I want to be able to dispatch the incoming messages into parallelized workers.
What is the proper way of doing such a thing ?
I devised that this is working but I wonder if it is a "good" solution:
val flow = Flow[String]
.map { value =>
if (value == "global")
throw new Exception("Unexpected failure parsing global")
println(value)
}
val result = Source(
Seq("coucou", "baba", "global", "obvious", "test", "lol", "supercool")
)
.mapAsync(2) { el =>
Source
.single(el)
.via(flow)
.recover {
case ex: Throwable =>
println(s"SEND message ${el} to another queue")
}
.runWith(Sink.foreach { _ =>
println(s"DELETE message ${el}")
})
}
.runWith(Sink.ignore)
What do you think?
In flow, instead of performing side effects, maybe you can simply wrap it in an Either so that you can handle left and right differently with nok and ok.
import akka.actor.ActorSystem
import akka.stream.SinkShape
import akka.stream.scaladsl.{Flow, GraphDSL, Partition, Sink, Source}
import scala.concurrent.ExecutionContext
object PartitionStream {
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("PartitionStream")
implicit val ec: ExecutionContext = system.dispatcher
val flow = Flow[String].map { value =>
if (value == "global")
Left(MyException("Unexpected failure parsing global"))
else Right(value)
}
val sink = Sink.fromGraph(GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val partition = b.add(
Partition[Either[MyException, String]](2, el => if (el.isLeft) 0 else 1)
)
val collectNok = Flow[Either[MyException, String]].collect {
case Left(ex) => ex
}
val collectOk = Flow[Either[MyException, String]].collect {
case Right(value) => value
}
val nok = b.add(Sink.foreach[MyException](x => println(x.msg)))
val ok = b.add(Sink.foreach[String](println))
partition.out(0) ~> collectNok ~> nok
partition.out(1) ~> collectOk ~> ok
SinkShape(partition.in)
})
Source(
List("coucou", "baba", "global", "obvious", "test", "lol", "supercool")
).via(flow).runWith(sink)
}
case class MyException(msg: String)
}

akka stream integrating akka-htpp web request call into stream

Getting started with Akka Streams I want to perform a simple computation. Extending the basic QuickStart https://doc.akka.io/docs/akka/2.5/stream/stream-quickstart.html with a call to a restful web api:
val source: Source[Int, NotUsed] = Source(1 to 100)
source.runForeach(println)
already works nicely to print the numbers. But when trying to create an Actor to perform the HTTP request (is this actually necessary?) according to https://doc.akka.io/docs/akka/2.5.5/scala/stream/stream-integrations.html
import akka.pattern.ask
implicit val askTimeout = Timeout(5.seconds)
val words: Source[String, NotUsed] =
Source(List("hello", "hi"))
words
.mapAsync(parallelism = 5)(elem => (ref ? elem).mapTo[String])
// continue processing of the replies from the actor
.map(_.toLowerCase)
.runWith(Sink.ignore)
I cannot get it to compile as the ? operator is not defined. As ar as I know this one would only be defined inside an actor.
I also do not understand yet where exactly inside mapAsync my custom actor needs to be called.
edit
https://blog.colinbreck.com/backoff-and-retry-error-handling-for-akka-streams/ contains at least parts of an example.
It looks like it is not mandatory to create an actor i.e.
implicit val system = ActorSystem()
implicit val ec = system.dispatcher
implicit val materializer = ActorMaterializer()
val source = Source(List("232::03::14062::19965186", "232::03::14062::19965189"))
.map(cellKey => {
val splits = cellKey.split("::")
val mcc = splits(0)
val mnc = splits(1)
val lac = splits(2)
val ci = splits(3)
CellKeySource(cellKey, mcc, mnc, lac, ci)
})
.limit(2)
.mapAsyncUnordered(2)(ck => getResponse(ck.cellKey, ck.mobileCountryCode, ck.mobileNetworkCode, ck.locationArea, ck.cellKey)("<<myToken>>"))
def getResponse(cellKey: String, mobileCountryCode:String, mobileNetworkCode:String, locationArea:String, cellId:String)(token:String): Future[String] = {
RestartSource.withBackoff(
minBackoff = 10.milliseconds,
maxBackoff = 30.seconds,
randomFactor = 0.2,
maxRestarts = 2
) { () =>
val responseFuture: Future[HttpResponse] =
Http().singleRequest(HttpRequest(uri = s"https://www.googleapis.com/geolocation/v1/geolocate?key=${token}", entity = ByteString(
// TODO use proper JSON objects
s"""
|{
| "cellTowers": [
| "mobileCountryCode": $mobileCountryCode,
| "mobileNetworkCode": $mobileNetworkCode,
| "locationAreaCode": $locationArea,
| "cellId": $cellId,
| ]
|}
""".stripMargin)))
Source.fromFuture(responseFuture)
.mapAsync(parallelism = 1) {
case HttpResponse(StatusCodes.OK, _, entity, _) =>
Unmarshal(entity).to[String]
case HttpResponse(statusCode, _, _, _) =>
throw WebRequestException(statusCode.toString() )
}
}
.runWith(Sink.head)
.recover {
case _ => throw StreamFailedAfterMaxRetriesException()
}
}
val done: Future[Done] = source.runForeach(println)
done.onComplete(_ ⇒ system.terminate())
is already the (partial) answer for the question i.e. how to integrate Akka-streams + akka-http. However, it does not work, i.e. only throws error 400s and never terminates.
i think you already found an api how to call akka-http client
regarding your first code snippet which doesn't work. i think there happened some misunderstanding of the example itself. you expected the code in the example to work after just copied. but the intension of the doc was to demonstrate just an example/concept, how you can delegate some long running task out of the stream flow and then consuming the result when it's ready. for this was used ask call to akka actor, because call to ask method returns a Future. probably the authors of the doc just omitted the definition of actor. you can try this one example:
import java.lang.System.exit
import akka.NotUsed
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
import akka.pattern.ask
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import akka.util.Timeout
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.language.higherKinds
object App extends scala.App {
implicit val sys: ActorSystem = ActorSystem()
implicit val mat: ActorMaterializer = ActorMaterializer()
val ref: ActorRef = sys.actorOf(Props[Translator])
implicit val askTimeout: Timeout = Timeout(5.seconds)
val words: Source[String, NotUsed] = Source(List("hello", "hi"))
words
.mapAsync(parallelism = 5)(elem => (ref ? elem).mapTo[String])
.map(_.toLowerCase)
.runWith(Sink.foreach(println))
.onComplete(t => {
println(s"finished: $t")
exit(1)
})
}
class Translator extends Actor {
override def receive: Receive = {
case msg => sender() ! s"$msg!"
}
}
You must import ask pattern from akka.
import akka.pattern.ask
Edit: OK, sorry, I can see that you have already imported. What is ref in your code? ActorRef?

Scala resolve multiple Futures and get a Map(String, AnyRef)

I am currently trying to resolve multiple futures at once but as some of them may fail, I don't want to get a failure on all if one of them fails, instead, end up with a Map(String, AnyRef) (meaning a Map with the future name and the response converted to what a need).
Currently I have the following:
val fsResp = channelList.map {
channelRef => channelRef.ask(ReportStatus).mapTo[EventMessage]
}
Future.sequence(fsResp).onComplete{
case Success(resp: Seq[EventMessage]) =>
resp.foreach { event => Supervisor.foreach(_ ! event) }
val channels = loadConfiguredComponents()
.collect {
case ("processor" | "external", components) => components.map {
case (name, config: Channel) =>
(name, serializeDetails(config, resp.find(_.channel == ChannelName(name))))
}
}.flatten.toMap
val event = EventMessage(...)
Supervisor.foreach(_ ! event)
case Failure(exception) => originalSender ! replayError(exception.getMessage)
}
But this fails if any of those fails. So How can I end up with a Map(channelRef.path.name, event() | exception) ?
Thanks!
You can use fallbackTo in order to avoid a Failure. In this example I change Future[T] to Future[Option[T]] in order to fallback to None, and then remove None elements.
import scala.concurrent.ExecutionContext.Implicits.global
def method(value:Int) = { Thread.sleep(2000); println(value); value }
println("start")
val eventualNone = Future.successful(None)
val futures = List(Future(method(1)), Future(method(2)), Future(method(3)), Future(throw new RuntimeException))
val withoutFailures = futures.map(_.map(Option.apply).fallbackTo(eventualNone))
Future.sequence(withoutFailures).map(_.flatten).onComplete {
case Success(values) => println(values)
case Failure(ex:Throwable) => println("FAIL")
}
Thread.sleep(5000)
output
start
1
3
2
List(1, 2, 3)
Can be changed to Either[Throwable, T] instead of Option[T] if you want to know what fails.
This code always be Success (regarding the Future result), so you need to inspect your values in order to know if all futures fail.
To capture successful/failed values from the list of Futures, you can first apply map/recover to each of them, then use Future.sequence to transform the result list into a Future of List[Either[Throwable,EventMessage]], as shown in the following trivialized example:
import scala.concurrent.{Future, Await}
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
case class EventMessage(id: Int, msg: String)
val fsResp = List(
Future{EventMessage(1, "M1")}, Future{throw new Throwable()}, Future{EventMessage(3, "M3")}
)
val f = Future.sequence(
fsResp.map( _.map{ resp =>
// Do stuff with `resp`, add item to `Map()`, etc ...
Right(resp)
}.
recover{ case e: Throwable =>
// Log `exception` info, etc ...
Left(e)
} )
)
Await.result(f, Duration.Inf)
// f: scala.concurrent.Future[List[Product with Serializable with scala.util.
// Either[Throwable,EventMessage]]] = Future(Success(List(
// Right(EventMessage(1,M1)), Left(java.lang.Throwable), Right(EventMessage(3,M3))
// )))

Identify Akka HttpRequest and HttpResponse?

While using Akka HttpRequest and pipe the request to an actor, i couldn't identify the response.
The actor will handle each message that will receive but it doesn't know which request used to get this response. Is there any way to identify each request to match the response with ?
Note: i don't have the server to resend any part of request body again.
Thanks in advance
MySelf.scala
import akka.actor.{ Actor, ActorLogging }
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import akka.stream.{ ActorMaterializer, ActorMaterializerSettings }
import akka.util.ByteString
class Myself extends Actor with ActorLogging {
import akka.pattern.pipe
import context.dispatcher
final implicit val materializer: ActorMaterializer =
ActorMaterializer(ActorMaterializerSettings(context.system))
def receive = {
case HttpResponse(StatusCodes.OK, headers, entity, _) =>
entity.dataBytes.runFold(ByteString(""))(_ ++ _).foreach { body =>
log.info("Got response, body: " + body.utf8String)
}
case resp # HttpResponse(code, _, _, _) =>
log.info("Request failed, response code: " + code)
resp.discardEntityBytes()
}
}
Main.scala
import akka.actor.{ActorSystem, Props}
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import akka.stream.ActorMaterializer
object HttpServerMain extends App {
import akka.pattern.pipe
// import system.dispatcher
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
// needed for the future flatMap/onComplete in the end
implicit val executionContext = system.dispatcher
val http = Http(system)
val myActor = system.actorOf(Props[MySelf])
http.singleRequest(HttpRequest(uri = "http://akka.io"))
.pipeTo(myActor)
http.singleRequest(HttpRequest(uri = "http://akka.io/another-request"))
.pipeTo(myActor)
Thread.sleep(2000)
system.terminate()
You can simply use map to transform the Future and add some kind of ID (usually called correlation ID for such purposes) to it before you pipe it to myActor:
http.singleRequest(HttpRequest(uri = "http://akka.io"))
.map(x => (1, x)).pipeTo(myActor)
You'll need to change you pattern match blocks to take a tupple:
case (id, HttpResponse(StatusCodes.OK, headers, entity, _)) =>
If you can't/don't want to change your pattern match block for some reason you can use same approach, but instead add a unique HTTP header into your completed request (using copy) with something like this (not checked if compiles):
// make a unique header name that you are sure will not be
// received from http response:
val correlationHeader: HttpHeader = ... // mycustomheader
// Basically hack the response to add your header:
http.singleRequest(HttpRequest(uri = "http://akka.io"))
.map(x => x.copy(headers = correlationHeader +: headers)).pipeTo(myActor)
// Now you can check your header to see which response that was:
case HttpResponse(StatusCodes.OK, headers, entity, _) =>
headers.find(_.is("mycustomheader")).map(_.value).getOrElse("NA")
This is more of a hack though compared to previous option because you are modifying a response.
I think you cannot do that directly using pipeTo because it essentially just adds andThen call to your Future. One option is tomap and then send a (request, response) tuple to actor:
val request = HttpRequest(uri = "http://akka.io")
http.singleRequest(request).map {
response => myActor ! (request, response)
}
class Myself extends Actor with ActorLogging {
...
def receive = {
case (request, HttpResponse(StatusCodes.OK, headers, entity, _)) =>
...
case (request, resp # HttpResponse(code, _, _, _)) =>
log.info(request.toString)
...
}
}

How to retry failed Unmarshalling of a stream of akka-http requests?

I know it's possible to restart an akka-stream on error with a supervision strategy on the ActorMaterialzer
val decider: Supervision.Decider = {
case _: ArithmeticException => Supervision.Resume
case _ => Supervision.Stop
}
implicit val materializer = ActorMaterializer(
ActorMaterializerSettings(system).withSupervisionStrategy(decider))
val source = Source(0 to 5).map(100 / _)
val result = source.runWith(Sink.fold(0)(_ + _))
// the element causing division by zero will be dropped
// result here will be a Future completed with Success(228)
source: http://doc.akka.io/docs/akka/2.4.2/scala/stream/stream-error.html
I have the following use case.
/***
scalaVersion := "2.11.8"
libraryDependencies ++= Seq(
"com.typesafe.akka" %% "akka-http-experimental" % "2.4.2",
"com.typesafe.akka" %% "akka-http-spray-json-experimental" % "2.4.2"
)
*/
import akka.http.scaladsl.unmarshalling.Unmarshal
import akka.http.scaladsl.marshallers.sprayjson.SprayJsonSupport._
import spray.json._
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import Uri.Query
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl._
import scala.util.{Success, Failure}
import scala.concurrent.Await
import scala.concurrent.duration.Duration
import scala.concurrent.Future
object SO extends DefaultJsonProtocol {
implicit val system = ActorSystem()
import system.dispatcher
implicit val materializer = ActorMaterializer()
val httpFlow = Http().cachedHostConnectionPoolHttps[HttpRequest]("example.org")
def search(query: Char) = {
val request = HttpRequest(uri = Uri("https://example.org").withQuery(Query("q" -> query.toString)))
(request, request)
}
case class Hello(name: String)
implicit val helloFormat = jsonFormat1(Hello)
val searches =
Source('a' to 'z').map(search).via(httpFlow).mapAsync(1){
case (Success(response), _) => Unmarshal(response).to[Hello]
case (Failure(e), _) => Future.failed(e)
}
def main(): Unit = {
Await.result(searches.runForeach(_ => println), Duration.Inf)
()
}
}
Sometime a query will fail to unmarshall. I want to use a retry strategy on that single query
https://example.org/?q=v without restarting the whole alphabet.
I think it will be hard (or impossible) to implement it with a supervsior strategy, mostly because you want to retry "n" times (according to the discussion in comments), and I don't think you can track the number of times the element was tried when using supervision.
I think there are two ways to solve this issue. Either handle the risky operation as a separate stream or create a graph, which will do error handling. I will propose two solutions.
Note also that Akka Streams distinguishes between errors and failures, so if you wont' handle your failures they will eventually collapse the flow (if no strategy is intriduced), so in the example below I convert them to Either, which represent either success or error.
Separate stream
What you can do is to treat each alphabet letter as a separate stream and handle failures for each letter separately with the retry strategy, and some delay.
// this comes after your helloFormat
// note that the method is somehow simpler because it's
// using implicit dispatcher and scheduler from outside scope,
// you may also want to pass it as implicit arguments
def retry[T](f: => Future[T], delay: FiniteDuration, c: Int): Future[T] =
f.recoverWith {
// you may want to only handle certain exceptions here...
case ex: Exception if c > 0 =>
println(s"failed - will retry ${c - 1} more times")
akka.pattern.after(delay, system.scheduler)(retry(f, delay, c - 1))
}
val singleElementFlow = httpFlow.mapAsync[Hello](1) {
case (Success(response), _) =>
val f = Unmarshal(response).to[Hello]
f.recoverWith {
case ex: Exception =>
// see https://github.com/akka/akka/issues/20192
response.entity.dataBytes.runWith(Sink.ignore).flatMap(_ => f)
}
case (Failure(e), _) => Future.failed(e)
}
// so the searches can either go ok or not, for each letter, we will retry up to 3 times
val searches =
Source('a' to 'z').map(search).mapAsync[Either[Throwable, Hello]](1) { elem =>
println(s"trying $elem")
retry(
Source.single(elem).via(singleElementFlow).runWith(Sink.head[Hello]),
1.seconds, 3
).map(ok => Right(ok)).recover { case ex => Left(ex) }
}
// end
Graph
This method will integrate failures into the graph, and will allow for retries. This example makes all requests run in parallel and prefer to retry those which failed, but if you don't want this behaviour and run them one by one this is something you can also do I believe.
// this comes after your helloFormat
// you may need to have your own class if you
// want to propagate failures for example, but we will use
// right value to keep track of how many times we have
// tried the request
type ParseResult = Either[(HttpRequest, Int), Hello]
def search(query: Char): (HttpRequest, (HttpRequest, Int)) = {
val request = HttpRequest(uri = Uri("https://example.org").withQuery(Query("q" -> query.toString)))
(request, (request, 0)) // let's use this opaque value to count how many times we tried to search
}
val g = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val searches = b.add(Flow[Char])
val tryParse =
Flow[(Try[HttpResponse], (HttpRequest, Int))].mapAsync[ParseResult](1) {
case (Success(response), (req, tries)) =>
println(s"trying parse response to $req for $tries")
Unmarshal(response).to[Hello].
map(h => Right(h)).
recoverWith {
case ex: Exception =>
// see https://github.com/akka/akka/issues/20192
response.entity.dataBytes.runWith(Sink.ignore).map { _ =>
Left((req, tries + 1))
}
}
case (Failure(e), _) => Future.failed(e)
}
val broadcast = b.add(Broadcast[ParseResult](2))
val nonErrors = b.add(Flow[ParseResult].collect {
case Right(x) => x
// you may also handle here Lefts which do exceeded retries count
})
val errors = Flow[ParseResult].collect {
case Left(x) if x._2 < 3 => (x._1, x)
}
val merge = b.add(MergePreferred[(HttpRequest, (HttpRequest, Int))](1, eagerComplete = true))
// #formatter:off
searches.map(search) ~> merge ~> httpFlow ~> tryParse ~> broadcast ~> nonErrors
merge.preferred <~ errors <~ broadcast
// #formatter:on
FlowShape(searches.in, nonErrors.out)
}
def main(args: Array[String]): Unit = {
val source = Source('a' to 'z')
val sink = Sink.seq[Hello]
source.via(g).toMat(sink)(Keep.right).run().onComplete {
case Success(seq) =>
println(seq)
case Failure(ex) =>
println(ex)
}
}
Basically what happens here is we run searches through httpFlow and then try to parse the response, we
then broadcast the result and split errors and non-errors, the non errors go to sink, and errors get sent
back to the loop. If the number of retries exceed the count, we ignore the element, but you can also do
something else with it.
Anyway I hope this gives you some idea.
For the streams solution above, any retries for the last element in the stream won't execute. That's because when the upstream completes after sending the last element the merge will also complete. After that the only output came come from the non-retry outlet but since the element goes to retry that gets completed too.
If you need all input elements to generate an output you'll need an extra mechanism to stop the upstream complete from reaching the process&retry graph. One possibility is to use a BidiFlow which monitors the input and output from the process&retry graph to ensure all the required outputs have been generated (for the observed inputs) before propagating the oncomplete. In the simple case that could just be counting input and output elements.