Futures to send concurrent HTTP GET requests - scala

Suppose I am writing a function to send a few concurrent HTTP GET requests and wait for all the responses with a time-out. If at least one response does not have status 200 or does not come within the time-out my function should return failure.
I am writing this function tryGets like this:
import java.net.URL
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
import scala.util.Try
def unsafeGet(url: URL): String = {
val in = url.openStream()
scala.io.Source.fromInputStream(in).mkString
}
def futureGet(url: URL)
(implicit ec: ExecutionContext): Future[String] = Future {
unsafeGet(url)
}
def tryGets(urls: Seq[URL], timeOut: Duration)
(implicit ec: ExecutionContext): Try[Seq[String]] = Try {
val fut = Future.sequence(urls.map(futureGet))
Await.result(fut, timeOut)
}
Does it make sense ?
Does not it leak future instances in case of time-out ?

If one of the Future's time out, then the rest of the Future's will continue to execute because the future's are eager and will continue to run on the Execution Context. What you could do is fold over the Urls but this will execute them in serial.
urls.foldleft(Future.sucessful(Seq.empty)) { (future, url) =>
future.flatMap(accum => futureGet(url).map(accum :+ _))
}

Related

Timeout Akka Streams Flow

I'm trying to use completionTimeout in an akka streams flow. I've provided a contrived example where the flow takes 10 seconds but I've added a completionTimeout with a timeout of 1 second. I would expect this flow to timeout after 1 second. However, in the example the flow completes in 10 seconds without any errors.
Why doesn't the flow timeout? Is there a better way to timeout a flow?
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.scaladsl.{Flow, Sink, Source}
import org.scalatest.{FlatSpec, Matchers}
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
class Test extends FlatSpec with Matchers {
implicit val system = ActorSystem("test")
"This Test" should "fail but passes and I don't know why" in {
//This takes 10 seconds to complete
val flow: Flow[String, String, NotUsed] = Flow[String]
.map(str => {
println(s"Processing ${str}")
Thread.sleep(10000)
})
.map(_ => {"Done!"})
val future: Future[String] =
Source.single("Input")
.via(flow)
.completionTimeout(1 second) // Set a timeout of 1 second
.runWith(Sink.last)
val result = Await.result(future, 15 seconds)
result should be("Done!")
}
}
In executing a given stream, Akka Stream leverages operator fusion to fuse stream operators by a single underlying actor for optimal performance. For your main thread to catch the timeout, you could introduce asynchrony by means of .async:
val future: Future[String] =
Source.single("Input")
.via(flow)
.async // <--- asynchronous boundary
.completionTimeout(1 second)
.runWith(Sink.last)
future.onComplete(println)
// Processing Input
// Failure(java.util.concurrent.TimeoutException: The stream has not been completed in 1 second.)
An alternative to introduce asynchrony is to use the mapAsync flow stage:
val flow: Flow[String, String, NotUsed] = Flow[String]
.map(str => {
println(s"Processing ${str}")
Thread.sleep(10000)
})
.mapAsync(1)(_ => Future("Done!")) // <--- asynchronous flow stage
Despite getting the same timeout error, you may notice it'll take ~10s to see result when using mapAsync, whereas only ~1s using async. That's because while mapAsync introduces an asynchronous flow stage, it's not an asynchronous boundary (like what async does) and is still subject to operator fusion.

Scala - How to use a Timer without blocking on Futures with Await.result

I have an Rest API provided by akka-http. In some cases I need to get data from an external database (Apache HBase), and I would like the query to fail if the database takes too long to deliver the data.
One naïve way is to wrap the call inside a Future and then block it with an Await.result with the needed duration.
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
object AsyncTest1 extends App {
val future = Future {
getMyDataFromDB()
}
val myData = Await.result(future, 100.millis)
}
The seems to be inefficient as this implementation needs two threads. Is There an efficient way to do this ?
I have another use case where I want to send multiple queries in parallel and then aggregates the results, with the same delay limitation.
val future1 = Future {
getMyDataFromDB1()
}
val future2 = Future {
getMyDataFromDB2()
}
val foldedFuture = Future.fold(
Seq(future1, future2))(MyAggregatedData)(myAggregateFunction)
)
val myData = Await.result(foldedFuture, 100.millis)
Same question here, what is the most efficient way to implement this ?
Thanks for your help
One solution would be to use Akka's after function which will let you pass a duration, after which the future throws an exception or whatever you want.
Take a look here. It demonstrates how to implement this.
EDIT:
I guess I'll post the code here in case the link gets broken in future:
import scala.concurrent._
import scala.concurrent.duration._
import ExecutionContext.Implicits.global
import scala.util.{Failure, Success}
import akka.actor.ActorSystem
import akka.pattern.after
val system = ActorSystem("theSystem")
lazy val f = future { Thread.sleep(2000); true }
lazy val t = after(duration = 1 second, using = system.scheduler)(Future.failed(new TimeoutException("Future timed out!")))
val fWithTimeout = Future firstCompletedOf Seq(f, t)
fWithTimeout.onComplete {
case Success(x) => println(x)
case Failure(error) => println(error)
}

AkkaHttp: Process incoming requests in parallel with multiple processes

Using AkkaHttp with Scala, the following code provides an endpoint for /api/endpoint/{DoubleNumber}. Querying this endpoint triggers a heavy computation and then returns the result as application/json.
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import akka.http.scaladsl.server.Directives._
import akka.stream.ActorMaterializer
object Run {
def main(args: Array[String]) = {
implicit val system = ActorSystem("myApi")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
val e = get {
path("api/endpoint" / DoubleNumber) {
case (myNumberArgument) {
val result = someHeavyComputation(myNumberArgument)
complete(HttpEntity(ContentTypes.`application/json`, result.toString))
}
}
}
}
}
If one sends several concurrent requests from, say, a browser's console, the above code will wait for each request to be completed (and the response returned) before starting to handle the next one.
How to fix the above code to make it work in parallel, in other words launch an additional process for each incoming request, if previous requests are still being processed?
Looks like I came to an answer.
If you have the same problem, simply call someHeavyComputation from inside the complete() block, not before:
val e = get {
path("api/endpoint" / DoubleNumber) {
case (myNumberArgument) {
complete {
val result = someHeavyComputation(myNumberArgument)
HttpEntity(ContentTypes.`application/json`, result.toString)
}
}
}
}
New processes will be launched as necessary.

akka HttpResponse read body as String scala

So I have a function with this signature (akka.http.model.HttpResponse):
def apply(query: Seq[(String, String)], accept: String): HttpResponse
I simply get a value in a test like:
val resp = TagAPI(Seq.empty[(String, String)], api.acceptHeader)
I want to check its body in a test something like:
resp.entity.asString == "tags"
My question is how I can get the response body as string?
import akka.http.scaladsl.unmarshalling.Unmarshal
implicit val system = ActorSystem("System")
implicit val materializer = ActorFlowMaterializer()
val responseAsString: Future[String] = Unmarshal(entity).to[String]
Since Akka Http is streams based, the entity is streaming as well. If you really need the entire string at once, you can convert the incoming request into a Strict one:
This is done by using the toStrict(timeout: FiniteDuration)(mat: Materializer) API to collect the request into a strict entity within a given time limit (this is important since you don't want to "try to collect the entity forever" in case the incoming request does actually never end):
import akka.stream.ActorFlowMaterializer
import akka.actor.ActorSystem
implicit val system = ActorSystem("Sys") // your actor system, only 1 per app
implicit val materializer = ActorFlowMaterializer() // you must provide a materializer
import system.dispatcher
import scala.concurrent.duration._
val timeout = 300.millis
val bs: Future[ByteString] = entity.toStrict(timeout).map { _.data }
val s: Future[String] = bs.map(_.utf8String) // if you indeed need a `String`
You can also try this one also.
responseObject.entity.dataBytes.runFold(ByteString(""))(_ ++ _).map(_.utf8String) map println
Unmarshaller.stringUnmarshaller(someHttpEntity)
works like a charm, implicit materializer needed as well
Here is simple directive that extracts string from request's body
def withString(): Directive1[String] = {
extractStrictEntity(3.seconds).flatMap { entity =>
provide(entity.data.utf8String)
}
}
Unfortunately in my case, Unmarshal to String didn't work properly complaining on: Unsupported Content-Type, supported: application/json. That would be more elegant solution, but I had to use another way. In my test I used Future extracted from entity of the response and Await (from scala.concurrent) to get the result from the Future:
Put("/post/item", requestEntity) ~> route ~> check {
val responseContent: Future[Option[String]] =
response.entity.dataBytes.map(_.utf8String).runWith(Sink.lastOption)
val content: Option[String] = Await.result(responseContent, 10.seconds)
content.get should be(errorMessage)
response.status should be(StatusCodes.InternalServerError)
}
If you need to go through all lines in a response, you can use runForeach of Source:
response.entity.dataBytes.map(_.utf8String).runForeach(data => println(data))
Here is my working example,
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.model._
import akka.stream.ActorMaterializer
import akka.util.ByteString
import scala.concurrent.Future
import scala.util.{ Failure, Success }
def getDataAkkaHTTP:Unit = {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
// needed for the future flatMap/onComplete in the end
implicit val executionContext = system.dispatcher
val url = "http://localhost:8080/"
val responseFuture: Future[HttpResponse] = Http().singleRequest(HttpRequest(uri = url))
responseFuture.onComplete {
case Success(res) => {
val HttpResponse(statusCodes, headers, entity, _) = res
println(entity)
entity.dataBytes.runFold(ByteString(""))(_ ++ _).foreach (body => println(body.utf8String))
system.terminate()
}
case Failure(_) => sys.error("something wrong")
}
}

Exceptions Thrown by Await#result

Given the following code:
import spray.http._
import spray.client.pipelining._
import scala.concurrent.Future
implicit val system = ActorSystem()
import system.dispatcher // execution context for futures
val pipeline: HttpRequest => Future[HttpResponse] = sendReceive
val response: Future[HttpResponse] = pipeline(Get("http://spray.io/"))
The following pseudo-code function waits 10 seconds, returning "GOOD" if the HttpResponse returned, or "BAD" on an Await#result exception (see docs.
import scala.concurrent.Await
import scala.concurrent.duration._
def f(fut: Future[HttpResponse]): String = {
try {
val result = Await.result(fut, 10.seconds)
"GOOD"
}
catch e # (_: InterruptedException | _: IllegalArgumentException
| _: TimeoutException ) => "BAD"
}
In my catch, is it only necessary to catch exception thrown by Await#result? In other words, am I not catching any possible exceptions here?
The Await.result itself can throw the exceptions you caught, however, if the future it awaits does not complete successfully, it forwards the exception contained by the future. You might want to read the Blocking section from here: Futures and Promises.
So yes, there may be exceptions you aren't catching, anything that can result from the failed computation of a HttpResponse.
Blocking in real code is usually bad and should be done only for testing purposes, but if you really need to, I would recommend to wrap the Await in a scala.util.Try so you could manipulate it elegantly later and also keep the information of when and why it failed.