Spray/Scala - Setting timeout on specific request - scala

I currently have a REST call set up using spray pipeline. If I don't get a response within x number of seconds, I want it to timeout, but only on that particular call. When making a spray client pipeline request, is there a good way to specify a timeout specific to that particular call?

As far as I can tell, as of spray-client 1.3.1 there is no way to customise the pipe after it has been created.
However, you can create custom pipes for different types of requests.
It's worth mentioning the fact that the timeouts defined below are the timeouts for the ask() calls, not for the network operations, but I guess this is what you need from your description.
I found the following article very useful in understanding a bit better how the library works behind the scenes: http://kamon.io/teamblog/2014/11/02/understanding-spray-client-timeout-settings/
Disclaimer: I haven't actually tried this, but I guess it should work:
val timeout1 = Timeout(5 minutes)
val timeout2 = Timeout(1 minutes)
val pipeline1: HttpRequest => Future[HttpResponse] = sendReceive(implicitly[ActorRefFactory],
implicitly[ExecutionContext], timeout1)
val pipeline2: HttpRequest => Future[HttpResponse] = sendReceive(implicitly[ActorRefFactory],
implicitly[ExecutionContext], timeout2)
and then you obviously use the appropriate pipe for each request

Related

Akka Streams - Backpressure for Source.unfoldAsync

I'm currently trying to read a paginated HTTP resource. Each page is a Multipart Document and the response for the page include a next link in the headers if there is a page with more content. An automated parser can then start at the oldest page and then read page by page using the headers to construct the request for the next page.
I'm using Akka Streams and Akka Http for the implementation, because my goal is to create a streaming solution. I came up with this (I will include only the relevant parts of the code here, feel free to have a look at this gist for the whole code):
def read(request: HttpRequest): Source[HttpResponse, _] =
Source.unfoldAsync[Option[HttpRequest], HttpResponse](Some(request))(Crawl.crawl)
val parse: Flow[HttpResponse, General.BodyPart, _] = Flow[HttpResponse]
.flatMapConcat(r => Source.fromFuture(Unmarshal(r).to[Multipart.General]))
.flatMapConcat(_.parts)
....
def crawl(reqOption: Option[HttpRequest]): Future[Option[(Option[HttpRequest], HttpResponse)]] = reqOption match {
case Some(req) =>
Http().singleRequest(req).map { response =>
if (response.status.isFailure()) Some((None, response))
else nextRequest(response, HttpMethods.GET)
}
case None => Future.successful(None)
}
So the general idea is to use Source.unfoldAsync to crawl through the pages and to do the HTTP requests (The idea and implementation are very close to what's described in this answer. This will create a Source[HttpResponse, _] that can then be consumed (Unmarshal to Multipart, split up into the individual parts, ...).
My problem now is that the consumption of the HttpResponses might take a while (Unmarshalling takes some time if the pages are large, maybe there will be some database requests at the end to persist some data, ...). So I would like the Source.unfoldAsync to backpressure if the downstream is slower. By default, the next HTTP request will be started as soon as the previous one finished.
So my question is: Is there some way to make Source.unfoldAsync backpressure on a slow downstream? If not, is there an alternative that makes backpressuring possible?
I can imagine a solution that makes use of the Host-Level Client-Side API that akka-http provides, as described here together with a cyclic graph where the response of first request will be used as input to generate the second request, but I haven't tried that yet and I'm not sure if this could work or not.
EDIT: After some days of playing around and reading the docs and some blogs, I'm not sure if I was on the right track with my assumption that the backpressure behavior of Source.unfoldAsync is the root cause. To add some more observations:
When the stream is started, I see several requests going out. This is no problem in the first place, as long as the resulting HttpResponse is consumed in a timely fashion (see here for a description)
If I don't change the default response-entity-subscription-timeout, I will run into the following error (I stripped out the URLs):
[WARN] [03/30/2019 13:44:58.984] [default-akka.actor.default-dispatcher-16] [default/Pool(shared->http://....)] [1 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 seconds. Make sure to read the response entity body or call discardBytes() on it. GET ... Empty -> 200 OK Chunked
This leads to an IllegalStateException that terminates the stream: java.lang.IllegalStateException: Substream Source cannot be materialized more than once
I observed that the unmarshalling of the response is the slowest part in the stream, which might make sense because the response body is a Multipart document and thereby relatively large. However, I would expect this part of the stream to signal less demand to the upstream (which is the Source.unfoldAsync part in my case). This should lead to the fact that less requests are made.
Some googling lead me to a discussion about an issue that seems to describe a similar problem. They also discuss the problems that occur when a response is not processed fast enough. The associated merge request will bring documentation changes that propose to completely consume the HttpResponse before continuing with the stream. In the discussion to the issue there are also doubts about whether or not it's a good idea to combine Akka Http with Akka Streams. So maybe I would have to change the implementation to directly do the unmarshalling inside the function that's being called by unfoldAsync.
According to the implementation of the Source.unfoldAsync the passed in function is only called when the source is pulled:
def onPull(): Unit = f(state).onComplete(asyncHandler)(akka.dispatch.ExecutionContexts.sameThreadExecutionContext)
So if the downstream is not pulling (backpressuring) the function passed in to the source is not called.
In your gist you use runForeach (which is the same as runWith(Sink.foreach)) that pulls the upstream as soon as the println is finished. So it is hard to notice backpressure here.
Try changing your example to runWith(Sink.queue) which will give you an SinkQueueWithCancel as the materialized value. Then, unless you call pull on the queue, the stream will be backpressured and will not issue requests.
Note that there could be one or more initial requests until the backpressure propagates through all of the stream.
I think I figured it out. As I already mentioned in the edit of my question, I found this comment to an issue in Akka HTTP, where the author says:
...it is simply not best practice to mix Akka http into a larger processing stream. Instead, you need a boundary around the Akka http parts of the stream that ensures they always consume their response before allowing the outer processing stream to proceed.
So I went ahead and tried it: Instead of doing the HTTP request and the unmarshalling in different stages of the stream, I directly unmarshal the response by flatMaping the Future[HttpResponse] into a Future[Multipart.General]. This makes sure that the HttpResponse is directly consumed and avoids the Response entity was not subscribed after 1 second errors. The crawl function looks slightly different now, because it has to return the unmarshalled Multipart.General object (for further processing) as well as the original HttpResponse (to be able to construct the next request out of the headers):
def crawl(reqOption: Option[HttpRequest])(implicit actorSystem: ActorSystem, materializer: Materializer, executionContext: ExecutionContext): Future[Option[(Option[HttpRequest], (HttpResponse, Multipart.General))]] = {
reqOption match {
case Some(request) =>
Http().singleRequest(request)
.flatMap(response => Unmarshal(response).to[Multipart.General].map(multipart => (response, multipart)))
.map {
case tuple#(response, multipart) =>
if (response.status.isFailure()) Some((None, tuple))
else nextRequest(response, HttpMethods.GET).map { case (req, res) => (req, (res, multipart)) }
}
case None => Future.successful(None)
}
}
The rest of the code has to change because of that. I created another gist that contains equivalent code like the gist from the original question.
I was expecting the two Akka projects to integrate better (the docs don't mention this limitation at the moment, and instead the HTTP API seems to encourage the user to use Akka HTTP and Akka Streams together), so this feels a bit like a workaround, but it solves my problem for now. I still have to figure out some other problems I encounter when integrating this part into my larger use case, but this is not part of this question here.

How to programmatically call Route in Akka Http

In Akka Http, it is possible to define the route system to manage a REST infrastructure in this way, as stated here: https://doc.akka.io/docs/akka-http/current/routing-dsl/overview.html
val route =
get {
pathSingleSlash {
complete(HttpEntity(ContentTypes.`text/html(UTF-8)`,"<html><body>Hello world!</body></html>"))
} ~
path("ping") {
complete("PONG!")
} ~
path("crash") {
sys.error("BOOM!")
}
}
Is there a way to programmatically invoke one of the route inside the same application, in a way that could be similar to the following statement?
val response = (new Invoker(route = route, method = "GET", url = "/ping", body = null)).Invoke()
where Response would be the same result of a remote HTTP call to the service?
The aforementioned API it's only to give an idea of what I have in mind, I would expect the capability to set the content type, headers, and so on.
In the end I managed to find out the answer to my own question by digging a bit more in Akka HTTP documentation.
As stated here: https://doc.akka.io/docs/akka-http/current/routing-dsl/routes.html, the Route is a type defined as follows:
type Route = RequestContext => Future[RouteResult]
where RequestContext is a wrapper for the HttpRequest. But is true as well that a Route can be converted, implicitly or not, to other function types, like this:
def asyncHandler(route: Route)(...): HttpRequest ⇒ Future[HttpResponse]
Hence, it is indeed possible to "call" a route by converting it to another function type, and then simply passing a HttpRequest build ad hoc, receiving a Future containing the desired response. The conversion required a little more time than the rest of the operations, but it's something that could be done while bootrstrapping the application.
Note: the conversion requires these imports, as stated here: https://doc.akka.io/docs/akka-http/current/introduction.html
implicit val system = ActorSystem("my-system")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
But these imports are already mandatory for the create of the service itself.
If this is for unit tests, you can use akka-http's test kit.
If this is for the application itself, you should not go through the route, you should just invoke the relevant services that the controller would use directly. If that is inconvenient (too much copy-pasta), refactor until it becomes possible.
As for the reason, I want my application to be wrapped inside both a web server (then use the route the “normal” way) and a daemon that responds to a message broker inbound message.
I have an application that does something like that actually.
But I came at this from the other way: I consider the broker message to be the "primary" format. It is "routed" inside of the consumer based purely on properties of the message itself (body contents, message key, topic name). The HTTP gateway is built on top of that: It has only a very limited number of API endpoints and routes (mostly for caller convenience, might as well have just a single one) and constructs a message that it then passes off to the message consumer (in my case, via the broker actually, so that the HTTP gateway does not even have to be on the same host as the consumer).
As a result, I don't have to "re-use" the HTTP route because that does not really do anything. All the shared processing logic happens at the lower level (inside the service, inside the consumer).

What is the best way to combine akka-http flow in a scala-stream flow

I have an use case where after n flows of Akka-stream, I have to take the result of one of them and made a request to a HTTP REST API.
The last akka-stream flow type, before the HTTP request is a string:
val stream1:Flow[T,String,NotUsed] = Flow[T].map(_.toString)
Now, HTTP request should be specified, I thought about something like:
val stream2: Flow[String,Future[HttpRespone],NotUsed] = Flow[String].map(param => Http.singleRequest(HttpRequest(uri=s"host.com/$param")))
and then combine it:
val stream3 = stream1 via stream2
Is it the best way to do it? Which ways you guys would actually recommend and why? A couple of best praxis examples in the scope of this use case would be great!
Thanks in advance :)
Your implementation would create a new connection to "host.com" for each new param. This is unnecessary and prevents akka from making certain optimizations. Under the hood akka actually keeps a connection pool around to reuse open connections but I think it is better to specify your intentions in the code and not rely on the underlying implementation.
You can make a single connection as described in the documentation:
val connectionFlow: Flow[HttpRequest, HttpResponse, _] =
Http().outgoingConnection("host.com")
To utilize this connection Flow you'll need to convert your String paths to HttpRequest objects:
import akka.http.scaladsl.model.Uri
import akka.http.scaladsl.model.Uri.Path
def pathToRequest(path : String) = HttpRequest(uri=Uri.Empty.withPath(Path(path)))
val reqFlow = Flow[String] map pathToRequest
And, finally, glue all the flows together:
val stream3 = stream1 via reqFlow via connectionFlow
This is the most common pattern for continuously querying the same server with different request objects.

Play 2.5.x (Scala) -- How does one put a value obtained via wsClient into a (lazy) val

The use case is actually fairly typical. A lot of web services use authorization tokens that you retrieve at the start of a session and you need to send those back on subsequent requests.
I know I can do it like this:
lazy val myData = {
val request = ws.url("/some/url").withAuth(user, password, WSAuthScheme.BASIC).withHeaders("Accept" -> "application/json")
Await.result(request.get().map{x => x.json }, 120.seconds)
}
That just feels wrong as all the docs say never us Await.
Is there a Future/Promise Scala style way of handling this?
I've found .onComplete which allows me to run code upon the completion of a Promise however without using a (mutable) var I see no way of getting a value in that scope into a lazy val in a different scope. Even with a var there is a possible timing issue -- hence the evils of mutable variables :)
Any other way of doing this?
Unfortunately, there is no way to make this non-blocking - lazy vals are designed to be synchronous and to block any thread accessing them until they are completed with a value (internally a lazy val is represented as a simple synchronized block).
A Future/Promise Scala way would be to use a Future[T] or a Promise[T] instead of a val x: T, but that way implies a great deal of overhead with executionContexts and maps upon each use of the val, and more optimal resource utilization may not be worth the decreased readability in all cases, so it may be OK to leave the Await there if you extensively use the value in many parts of your application.

What effect does using Action.async have, since Play uses Netty which is non-blocking

Since Netty is a non-blocking server, what effect does changing an action to using .async?
def index = Action { ... }
versus
def index = Action.async { ... }
I understand that with .async you will get a Future[SimpleResult]. But since Netty is non-blocking, will Play do something similar under the covers anyway?
What effect will this have on throughput/scalability? Is this a hard question to answer where it depends on other factors?
The reason I am asking is, I have my own custom Action and I wanted to reset the cookie timeout for every page request so I am doing this which is a async call:
object MyAction extends ActionBuilder[abc123] {
def invokeBlock[A](request: Request[A], block: (abc123[A]) => Future[SimpleResult]) = {
...
val result: Future[SimpleResult] = block(new abc123(..., result))
result.map(_.withCookies(...))
}
}
The take away from the above snippet is I am using a Future[SimpleResult], is this similar to calling Action.async but this is inside of my Action itself?
I want to understand what effect this will have on my application design. It seems like just for the ability to set my cookie on a per request basis I have changed from blocking to non-blocking. But I am confused since Netty is non-blocking, maybe I haven't really changed anything in reality as it was already async?
Or have I simply created another async call embedded in another one?
Hoping someone can clarify this with some details and how or what effect this will have in performance/throughput.
def index = Action { ... } is non-blocking you are right.
The purpose of Action.async is simply to make it easier to work with Futures in your actions.
For example:
def index = Action.async {
val allOptionsFuture: Future[List[UserOption]] = optionService.findAll()
allOptionFuture map {
options =>
Ok(views.html.main(options))
}
}
Here my service returns a Future, and to avoid dealing with extracting the result I just map it to a Future[SimpleResult] and Action.async takes care of the rest.
If my service was returning List[UserOption] directly I could just use Action.apply, but under the hood it would still be non-blocking.
If you look at Action source code, you can even see that apply eventually calls async:
https://github.com/playframework/playframework/blob/2.3.x/framework/src/play/src/main/scala/play/api/mvc/Action.scala#L432
I happened to come across this question, I like the answer from #vptheron, and I also want to share something I read from book "Reactive Web Applications", which, I think, is also great.
The Action.async builder expects to be given a function of type Request => Future[Result]. Actions declared in this fashion are not much different from plain Action { request => ... } calls, the only difference is that Play knows that Action.async actions are already asynchronous, so it doesn’t wrap their contents in a future block.
That’s right — Play will by default schedule any Action body to be executed asynchronously against its default web worker pool by wrapping the execution in a future. The only difference between Action and Action.async is that in the second case, we’re taking care of providing an asynchronous computation.
It also presented one sample:
def listFiles = Action { implicit request =>
val files = new java.io.File(".").listFiles
Ok(files.map(_.getName).mkString(", "))
}
which is problematic, given its use of the blocking java.io.File API.
Here the java.io.File API is performing a blocking I/O operation, which means that one of the few threads of Play's web worker pool will be hijacked while the OS figures out the list of files in the execution directory. This is the kind of situation you should avoid at all costs, because it means that the worker pool may run out of threads.
-
The reactive audit tool, available at https://github.com/octo-online/reactive-audit, aims to point out blocking calls in a project.
Hope it helps, too.