Convert HttpEntity.Chunked to Array[String] - scala

I have the following problem.
I am querying a server for some data and getting it back as HttpEntity.Chunked.
The response String looks like this with up to 10.000.000 lines like this:
[{"name":"param1","value":122343,"time":45435345},
{"name":"param2","value":243,"time":4325435},
......]
Now I want to get the incoming data into and Array[String] where each String is a line from the response, because later on it should be imported into an apache spark dataframe.
Currently I am doing it likes this:
//For the http request
trait StartHttpRequest {
implicit val system: ActorSystem
implicit val materializer: ActorMaterializer
def httpRequest(data: String, path: String, targetPort: Int, host: String): Future[HttpResponse] = {
val connectionFlow: Flow[HttpRequest, HttpResponse, Future[OutgoingConnection]] = {
Http().outgoingConnection(host, port = targetPort)
}
val responseFuture: Future[HttpResponse] =
Source.single(RequestBuilding.Post(uri = path, entity = HttpEntity(ContentTypes.`application/json`, data)))
.via(connectionFlow)
.runWith(Sink.head)
responseFuture
}
}
//result of the request
val responseFuture: Future[HttpResponse] = httpRequest(.....)
//convert to string
responseFuture.flatMap { response =>
response.status match {
case StatusCodes.OK =>
Unmarshal(response.entity).to[String]
}
}
//and then something like this, but with even more stupid stuff
responseFuture.onSuccess { str:String =>
masterActor! str.split("""\},\{""")
}
My question is, what would be a better way to get the result into an array?
How can I unmarshall the response entity directly? Because .to[Array[String]] for example did not work. And because there are so many lines coming, could I do it with a stream, to be more efficent?

Answering your questions out of order:
How can I unmarshall the response entity directly?
There is an existing question & answer related to unmarshalling an Array of case classes.
what would be a better way to get the result into an array?
I would take advantage of the Chunked nature and use streams. This allows you to do string processing and json parsing concurrently.
First you need a container class and parser:
case class Data(name : String, value : Int, time : Long)
object MyJsonProtocol extends DefaultJsonProtocol {
implicit val dataFormat = jsonFormat3(Data)
}
Then you have to do some manipulations to get the json objects to look right:
//Drops the '[' and the ']' characters
val dropArrayMarkers =
Flow[ByteString].map(_.filterNot(b => b == '['.toByte || b == ']'.toByte))
val preppendBrace =
Flow[String].map(s => if(!s.startsWith("{")) "{" + s else s)
val appendBrace =
Flow[String].map(s => if(!s.endsWith("}")) s + "}" else s)
val parseJson =
Flow[String].map(_.parseJson.convertTo[Data])
Finally, combine these Flows to convert a Source of ByteString into a Source of Data objects:
def strSourceToDataSource(source : Source[ByteString,_]) : Source[Data, _] =
source.via(dropArrayMarkers)
.via(Framing.delimiter(ByteString("},{"), 256, true))
.map(_.utf8String)
.via(prependBrace)
.via(appendBrace)
.via(parseJson)
This source can then be drained into an Seq of Data objects:
val dataSeq : Future[Seq[Data]] =
responseFuture flatMap { response =>
response.status match {
case StatusCodes.OK =>
strSourceToDataSource(response.entity.dataBytes).runWith(Sink.seq)
}
}

Related

Akka flow for multiple http requests

In a project of mine I have an akka actor for sending post requests to my google fcm server. The actor takes a list of ids and should make as many requests as there are in the list. I print out the response from the server in runForeach(println(_)) but I only get one printout for a whole list of ids. Why does this happen?
class FCMActor(val key: String) extends Actor{
import fcm.FCMActor._
import akka.pattern.pipe
import context.dispatcher
private implicit def system: ActorSystem = ActorSystem()
final implicit val materializer: ActorMaterializer = ActorMaterializer(ActorMaterializerSettings(context.system))
def buildBody(id: Option[String]): String = {
Json.obj(
"to" -> id,
"priority" -> "high",
"data" -> Json.obj("message" -> "Firebase Clud Message"),
"time_to_live" -> 60
).toString()
}
def buildHttpRequest(body: String): HttpRequest = {
HttpRequest(method = HttpMethods.POST,
uri = s"/fcm/send",
entity = HttpEntity(MediaTypes.`application/json`, body),
headers = List(RawHeader("Authorization", s"key=$key")))
}
val connectionFlow: Flow[HttpRequest, HttpResponse, Future[Http.OutgoingConnection]] = {
Http().outgoingConnection("fcm.googleapis.com")
}
def send(ids: List[Option[String]]) = {
val httpRequests: List[HttpRequest] = ids.map(buildBody).map(buildHttpRequest)
println(httpRequests)
Source(httpRequests).via(connectionFlow).runForeach(println(_)) // << here I only get one println
}
override def receive: Receive = {
case SendToIds(ids: List[Option[String]]) =>
send(ids)
}
}
You are not consuming the response entity that the server sends you. To understand why this is important, check out the related docs page.
A quick code change to try and fix this is:
... .runForeach{ response =>
response.discardEntityBytes()
println(response)
}
Or, if you're actually interested in the entity, something along the lines of
... .runForeach{ _.entity.dataBytes
.runFold(ByteString.empty) { case (acc, b) => acc ++ b }
.map(println(_))
}

Scala Play 2.5.0 2.5.1 Filtering with access to result body

I didn't find a way to get the body within a Filter in Play 2.5.x.
I want to create a "BadRequestLogFilter", which should log the request AND the result, if my appliation returns a status code 400-500
In Play 2.4.x, I used Iteratees and it worked.
I was not able to migrate this piece of code to Play 2.5.x. Can somebody give me a hint here? Maybe the hole approach to get the body in an Filter is an bad idea?
Here is my old (in 2.4.x working properly) Filter for Play 2.4.x
class BadRequestLogFilter #Inject() (implicit val mat: Materializer, ec: ExecutionContext) extends Filter {
val logger = Logger("bad_status").underlyingLogger
override def apply(next: (RequestHeader) => Future[Result])(request: RequestHeader): Future[Result] = {
val resultFuture = next(request)
resultFuture.foreach(result => {
val status = result.header.status
if (status < 200 || status >= 400) {
val c = Try(request.tags(Router.Tags.RouteController))
val a = Try(request.tags(Router.Tags.RouteActionMethod))
val body = result.body.run(Iteratee.fold(Array.empty[Byte]) { (memo, nextChunk) => memo ++ nextChunk })
val futResponse = body.map(bytes => new String(bytes))
futResponse.map { response =>
val m = Map("method" -> request.method,
"uri" -> request.uri,
"status" -> status,
"response" -> response,
"request" -> request,
"controller" -> c.getOrElse("empty"),
"actionMethod" -> a.getOrElse("empty"))
val msg = m.map { case (k, v) => s"$k=$v" }.mkString(", ")
logger.info(appendEntries(m), msg)
}
}
})
resultFuture
}
}
I guess I just need an valid replacement for this line here:
val body = result.body.run(Iteratee.fold(Array.empty[Byte]) { (memo, nextChunk) => memo ++ nextChunk })
The body of a result in Play 2.5.x is of type HttpEntity
So once you have the result you can get the body and then materialize it:
val body = result.body.consumeData(mat)
Here mat is the implicit Materializer you have. This is going to return you a Future[ByteString] which you can then decode to get a String representation (I have omitted future handling here for simplicity):
val bodyAsString = body.decodeString("UTF-8")
logger.info(bodyAsString)

How to download a HTTP resource to a file with Akka Streams and HTTP?

Over the past few days I have been trying to figure out the best way to download a HTTP resource to a file using Akka Streams and HTTP.
Initially I started with the Future-Based Variant and that looked something like this:
def downloadViaFutures(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
That was kind of okay but once I learnt more about pure Akka Streams I wanted to try and use the Flow-Based Variant to create a stream starting from a Source[HttpRequest]. At first this completely stumped me until I stumbled upon the flatMapConcat flow transformation. This ended up a little more verbose:
def responseOrFail[T](in: (Try[HttpResponse], T)): (HttpResponse, T) = in match {
case (responseTry, context) => (responseTry.get, context)
}
def responseToByteSource[T](in: (HttpResponse, T)): Source[ByteString, Any] = in match {
case (response, _) => response.entity.dataBytes
}
def downloadViaFlow(uri: Uri, file: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSource).
runWith(FileIO.toFile(file))
}
Then I wanted to get a little tricky and use the Content-Disposition header.
Going back to the Future-Based Variant:
def destinationFile(downloadDir: File, response: HttpResponse): File = {
val fileName = response.header[ContentDisposition].get.value
val file = new File(downloadDir, fileName)
file.createNewFile()
file
}
def downloadViaFutures2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val responseFuture = Http().singleRequest(request)
responseFuture.flatMap { response =>
val file = destinationFile(downloadDir, response)
val source = response.entity.dataBytes
source.runWith(FileIO.toFile(file))
}
}
But now I have no idea how to do this with the Future-Based Variant. This is as far as I got:
def responseToByteSourceWithDest[T](in: (HttpResponse, T), downloadDir: File): Source[(ByteString, File), Any] = in match {
case (response, _) =>
val source = responseToByteSource(in)
val file = destinationFile(downloadDir, response)
source.map((_, file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File): Future[Long] = {
val request = Get(uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
val sourceWithDest: Source[(ByteString, File), Unit] = source.
via(requestResponseFlow).
map(responseOrFail).
flatMapConcat(responseToByteSourceWithDest(_, downloadDir))
sourceWithDest.runWith(???)
}
So now I have a Source that will emit one or more (ByteString, File) elements for each File (I say each File since there is no reason the original Source has to be a single HttpRequest).
Is there anyway to take these and route them to a dynamic Sink?
I'm thinking something like flatMapConcat, such as:
def runWithMap[T, Mat2](f: T => Graph[SinkShape[Out], Mat2])(implicit materializer: Materializer): Mat2 = ???
So that I could complete downloadViaFlow2 with:
def destToSink(destination: File): Sink[(ByteString, File), Future[Long]] = {
val sink = FileIO.toFile(destination, true)
Flow[(ByteString, File)].map(_._1).toMat(sink)(Keep.right)
}
sourceWithDest.runWithMap {
case (_, file) => destToSink(file)
}
The solution does not require a flatMapConcat. If you don't need any return values from the file writing then you can use Sink.foreach:
def writeFile(downloadDir : File)(httpResponse : HttpResponse) : Future[Long] = {
val file = destinationFile(downloadDir, httpResponse)
httpResponse.entity.dataBytes.runWith(FileIO.toFile(file))
}
def downloadViaFlow2(uri: Uri, downloadDir: File) : Future[Unit] = {
val request = HttpRequest(uri=uri)
val source = Source.single((request, ()))
val requestResponseFlow = Http().superPool[Unit]()
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.runWith(Sink.foreach(writeFile(downloadDir)))
}
Note that the Sink.foreach creates Futures from the writeFile function. Therefore there's not much back-pressure involved. The writeFile could be slowed down by the hard drive but the stream would keep generating Futures. To control this you can use Flow.mapAsyncUnordered (or Flow.mapAsync) :
val parallelism = 10
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.ignore)
If you want to accumulate the Long values for a total count you need to combine with a Sink.fold:
source.via(requestResponseFlow)
.map(responseOrFail)
.map(_._1)
.mapAsyncUnordered(parallelism)(writeFile(downloadDir))
.runWith(Sink.fold(0L)(_ + _))
The fold will keep a running sum and emit the final value when the source of requests has dried up.
Using the play Web Services client injected in ws, remmebering to import scala.concurrent.duration._:
def downloadFromUrl(url: String)(ws: WSClient): Future[Try[File]] = {
val file = File.createTempFile("my-prefix", new File("/tmp"))
file.deleteOnExit()
val futureResponse: Future[WSResponse] =
ws.url(url).withMethod("GET").withRequestTimeout(5 minutes).stream()
futureResponse.flatMap { res =>
res.status match {
case 200 =>
val outputStream = java.nio.file.Files.newOutputStream(file.toPath)
val sink = Sink.foreach[ByteString] { bytes => outputStream.write(bytes.toArray) }
res.bodyAsSource.runWith(sink).andThen {
case result =>
outputStream.close()
result.get
} map (_ => Success(file))
case other => Future(Failure[File](new Exception("HTTP Failure, response code " + other + " : " + res.statusText)))
}
}
}

How to perform a simple json post with spray-json in spray?

I'm trying to perform a simple json post with spray. But it seems that i can get an http entity for a json object that can be Marshall.
here is my error:
[error]
...../IdeaProjects/PoolpartyConnector/src/main/scala/org/iadb/poolpartyconnector/thesaurusoperation/ThesaurusCacheService.scala:172:
could not find implicit value for evidence parameter of type
spray.httpx.marshalling.Marshaller[spray.json.JsValue]
[error] val request =
Post(s"$thesaurusapiEndpoint/$coreProjectId/suggestFreeConcept?",
suggestionJsonBody)
and the code that comes with it:
override def createSuggestedFreeConcept(suggestedPrefLabel: String, lang: String, scheme: String, b: Boolean): String = {
import system.dispatcher
import spray.json._
val pipeline = addCredentials(BasicHttpCredentials("superadmin", "poolparty")) ~> sendReceive
val label = LanguageLiteral(suggestedPrefLabel, lang)
val suggestion = SuggestFreeConcept(List(label), b, Some(List(scheme)), None, None,None, None)
val suggestionJsonBody = suggestion.toJson
val request = Post(s"$thesaurusapiEndpoint/$coreProjectId/suggestFreeConcept?", suggestionJsonBody)
val res = pipeline(request)
getSuggestedFromFutureHttpResponse(res) match {
case None => ""
case Some(e) => e
}
}
Please, does any one has an idea of what is going on with the implicit marshaller. I though spray Json would come with implicit marshaller.
I assume you already have a custom Json Protocol somewhere so that suggestion.toJson works correctly?
Try the following:
val body = HttpEntity(`application/json`, suggestionJsonBody.prettyPrint)
val request = Post(s"$thesaurusapiEndpoint/$coreProjectId/suggestFreeConcept?", body)
you could also use compactPrint rather than prettyPrint, in either case, it turns the Json into a string containing the json information.
Here is how i solved it:
override def createSuggestedFreeConcepts(suggestedPrefLabels: List[LanguageLiteral], scheme: String, checkDuplicates: Boolean): List[String] = {
import system.dispatcher
import spray.httpx.marshalling._
import spray.httpx.SprayJsonSupport._
val pipeline = addCredentials(BasicHttpCredentials("superadmin", "poolparty")) ~> sendReceive
suggestedPrefLabels map { suggestedPrefLabel =>
val suggestion = SuggestFreeConcept(List(suggestedPrefLabel), checkDuplicates, Some(List(Uri(scheme))), None, None, None, None)
val request = Post(s"$thesaurusapiEndpoint/$coreProjectId/suggestFreeConcept", marshal(suggestion))
val res = pipeline(request)
getSuggestedFromFutureHttpResponse(res) match {
case None => ""
case Some(e) => e
}
}
}
the key is:
import spray.httpx.marshalling._ import spray.httpx.SprayJsonSupport._
and
val request =
Post(s"$thesaurusapiEndpoint/$coreProjectId/suggestFreeConcept",
marshal(suggestion))
I marshall suggestion. The explanation is not super super straightforward. But by fetching around in the doc, it is explained.

How combine data from several responses in Scala Play2?

I need do some requests to different URLs, get data from their responses and put this info in one list, but i have some misunderstanding in this theme.
1) for one request i do
def doRequest: Future[WSResponse] = {
client
.url("MY_URL")
.withRequestTimeout(5000)
.get()}
Then I parse json in response to List of my objects:
def list: Future[List[FoobarEntry]] = {
doRequest.map {
response => {
val json = response.json \ "foobar"
json.validate[List[FoobarEntry]] match {
case js:JsSuccess[List[FoobarEntry]]=>
js.get
case e:JsError => Logger.error(JsError.toFlatJson(e).toString()); List()
}
}
}}
I think that for several url i should write some look like
def doRequests: List[Future[WSResponse]] = {
List(client
.url("URL_1")
.withRequestTimeout(5000)
.get(),
client
.url("URL_2")
.withRequestTimeout(5000)
.get())}
But how parse this list of Future[WSResponse] like my def list: Future[List[FoobarEntry]]?
Since you put the future responses inside of a list, you will have to map each future response with the parsing logic that turns it into a FoobarEntry. Like so:
val responseFutures: List[Future[WSResponse]] = ???
val foobarFutures: List[Future[FoobarEntry]] =
responseFutures.map(future => future.map(response => parse(response)))
Now you have a list of future parsed responses, but to do something when all of them has arrived you will need to sequence that list:
val futureFoobars = Future.sequence(foobarFutures)
So, sequence helps you to get from C[Future[A]] to Future[C[A]]
Use for-comprehensions.
val request1 = client.url("URL_1").withRequestTimeout(5000).get()
val request2 = client.url("URL_2").withRequestTimeout(5000).get()
val result: Future[List[FoobarEntry]] = for {
res1: WSResponse <- request1
res2: WSResponse <- request2
} yield List(res1, res2).map(parse).flatten
def parse(response: WSResponse): List[FoobarEntry] = {
val json = response.json \ "foobar"
json.validate[List[FoobarEntry]] match {
case js:JsSuccess[List[FoobarEntry]]=>
js.get
case e:JsError => Logger.error(JsError.toFlatJson(e).toString())
List()
}