How can I intercept HTTP client requests made with Akka HTTP? - scala

I have several components that proxy requests to a REST service and deserializes the result as appropriate. For example, something like:
import akka.http.scaladsl.HttpExt
trait UsersResource {
val http: HttpExt
def getUser(id: String): Future[User] = http.singleRequest(HttpRequest(...))
.flatMap(r => Unmarshal(r.entity).to[User])
def findUsers(query: Any): Future[List[User]]
}
I'd like to somehow proxy each of these requests, so that I can modify the request (eg. to add headers) or modify the response. Specifically, I'm interested in adding some code that adds:
logging and monitoring request/response
add cookies to each request
add auth to request
transform the response body
Since each specific resource typically has these three steps in common (and in some cases this logic is common across all resources), I'd like to change the http: HttpExt field somehow to apply these steps.
Is anything like this possible with Akka HTTP?
I came across this question, which seems to touch on part of this question (specifically the part about logging/monitoring) however the accepted answer appears to be using HTTP Directives on the server side, rather than at the client.

Related

Akka Streams - Backpressure for Source.unfoldAsync

I'm currently trying to read a paginated HTTP resource. Each page is a Multipart Document and the response for the page include a next link in the headers if there is a page with more content. An automated parser can then start at the oldest page and then read page by page using the headers to construct the request for the next page.
I'm using Akka Streams and Akka Http for the implementation, because my goal is to create a streaming solution. I came up with this (I will include only the relevant parts of the code here, feel free to have a look at this gist for the whole code):
def read(request: HttpRequest): Source[HttpResponse, _] =
Source.unfoldAsync[Option[HttpRequest], HttpResponse](Some(request))(Crawl.crawl)
val parse: Flow[HttpResponse, General.BodyPart, _] = Flow[HttpResponse]
.flatMapConcat(r => Source.fromFuture(Unmarshal(r).to[Multipart.General]))
.flatMapConcat(_.parts)
....
def crawl(reqOption: Option[HttpRequest]): Future[Option[(Option[HttpRequest], HttpResponse)]] = reqOption match {
case Some(req) =>
Http().singleRequest(req).map { response =>
if (response.status.isFailure()) Some((None, response))
else nextRequest(response, HttpMethods.GET)
}
case None => Future.successful(None)
}
So the general idea is to use Source.unfoldAsync to crawl through the pages and to do the HTTP requests (The idea and implementation are very close to what's described in this answer. This will create a Source[HttpResponse, _] that can then be consumed (Unmarshal to Multipart, split up into the individual parts, ...).
My problem now is that the consumption of the HttpResponses might take a while (Unmarshalling takes some time if the pages are large, maybe there will be some database requests at the end to persist some data, ...). So I would like the Source.unfoldAsync to backpressure if the downstream is slower. By default, the next HTTP request will be started as soon as the previous one finished.
So my question is: Is there some way to make Source.unfoldAsync backpressure on a slow downstream? If not, is there an alternative that makes backpressuring possible?
I can imagine a solution that makes use of the Host-Level Client-Side API that akka-http provides, as described here together with a cyclic graph where the response of first request will be used as input to generate the second request, but I haven't tried that yet and I'm not sure if this could work or not.
EDIT: After some days of playing around and reading the docs and some blogs, I'm not sure if I was on the right track with my assumption that the backpressure behavior of Source.unfoldAsync is the root cause. To add some more observations:
When the stream is started, I see several requests going out. This is no problem in the first place, as long as the resulting HttpResponse is consumed in a timely fashion (see here for a description)
If I don't change the default response-entity-subscription-timeout, I will run into the following error (I stripped out the URLs):
[WARN] [03/30/2019 13:44:58.984] [default-akka.actor.default-dispatcher-16] [default/Pool(shared->http://....)] [1 (WaitingForResponseEntitySubscription)] Response entity was not subscribed after 1 seconds. Make sure to read the response entity body or call discardBytes() on it. GET ... Empty -> 200 OK Chunked
This leads to an IllegalStateException that terminates the stream: java.lang.IllegalStateException: Substream Source cannot be materialized more than once
I observed that the unmarshalling of the response is the slowest part in the stream, which might make sense because the response body is a Multipart document and thereby relatively large. However, I would expect this part of the stream to signal less demand to the upstream (which is the Source.unfoldAsync part in my case). This should lead to the fact that less requests are made.
Some googling lead me to a discussion about an issue that seems to describe a similar problem. They also discuss the problems that occur when a response is not processed fast enough. The associated merge request will bring documentation changes that propose to completely consume the HttpResponse before continuing with the stream. In the discussion to the issue there are also doubts about whether or not it's a good idea to combine Akka Http with Akka Streams. So maybe I would have to change the implementation to directly do the unmarshalling inside the function that's being called by unfoldAsync.
According to the implementation of the Source.unfoldAsync the passed in function is only called when the source is pulled:
def onPull(): Unit = f(state).onComplete(asyncHandler)(akka.dispatch.ExecutionContexts.sameThreadExecutionContext)
So if the downstream is not pulling (backpressuring) the function passed in to the source is not called.
In your gist you use runForeach (which is the same as runWith(Sink.foreach)) that pulls the upstream as soon as the println is finished. So it is hard to notice backpressure here.
Try changing your example to runWith(Sink.queue) which will give you an SinkQueueWithCancel as the materialized value. Then, unless you call pull on the queue, the stream will be backpressured and will not issue requests.
Note that there could be one or more initial requests until the backpressure propagates through all of the stream.
I think I figured it out. As I already mentioned in the edit of my question, I found this comment to an issue in Akka HTTP, where the author says:
...it is simply not best practice to mix Akka http into a larger processing stream. Instead, you need a boundary around the Akka http parts of the stream that ensures they always consume their response before allowing the outer processing stream to proceed.
So I went ahead and tried it: Instead of doing the HTTP request and the unmarshalling in different stages of the stream, I directly unmarshal the response by flatMaping the Future[HttpResponse] into a Future[Multipart.General]. This makes sure that the HttpResponse is directly consumed and avoids the Response entity was not subscribed after 1 second errors. The crawl function looks slightly different now, because it has to return the unmarshalled Multipart.General object (for further processing) as well as the original HttpResponse (to be able to construct the next request out of the headers):
def crawl(reqOption: Option[HttpRequest])(implicit actorSystem: ActorSystem, materializer: Materializer, executionContext: ExecutionContext): Future[Option[(Option[HttpRequest], (HttpResponse, Multipart.General))]] = {
reqOption match {
case Some(request) =>
Http().singleRequest(request)
.flatMap(response => Unmarshal(response).to[Multipart.General].map(multipart => (response, multipart)))
.map {
case tuple#(response, multipart) =>
if (response.status.isFailure()) Some((None, tuple))
else nextRequest(response, HttpMethods.GET).map { case (req, res) => (req, (res, multipart)) }
}
case None => Future.successful(None)
}
}
The rest of the code has to change because of that. I created another gist that contains equivalent code like the gist from the original question.
I was expecting the two Akka projects to integrate better (the docs don't mention this limitation at the moment, and instead the HTTP API seems to encourage the user to use Akka HTTP and Akka Streams together), so this feels a bit like a workaround, but it solves my problem for now. I still have to figure out some other problems I encounter when integrating this part into my larger use case, but this is not part of this question here.

How to programmatically call Route in Akka Http

In Akka Http, it is possible to define the route system to manage a REST infrastructure in this way, as stated here: https://doc.akka.io/docs/akka-http/current/routing-dsl/overview.html
val route =
get {
pathSingleSlash {
complete(HttpEntity(ContentTypes.`text/html(UTF-8)`,"<html><body>Hello world!</body></html>"))
} ~
path("ping") {
complete("PONG!")
} ~
path("crash") {
sys.error("BOOM!")
}
}
Is there a way to programmatically invoke one of the route inside the same application, in a way that could be similar to the following statement?
val response = (new Invoker(route = route, method = "GET", url = "/ping", body = null)).Invoke()
where Response would be the same result of a remote HTTP call to the service?
The aforementioned API it's only to give an idea of what I have in mind, I would expect the capability to set the content type, headers, and so on.
In the end I managed to find out the answer to my own question by digging a bit more in Akka HTTP documentation.
As stated here: https://doc.akka.io/docs/akka-http/current/routing-dsl/routes.html, the Route is a type defined as follows:
type Route = RequestContext => Future[RouteResult]
where RequestContext is a wrapper for the HttpRequest. But is true as well that a Route can be converted, implicitly or not, to other function types, like this:
def asyncHandler(route: Route)(...): HttpRequest ⇒ Future[HttpResponse]
Hence, it is indeed possible to "call" a route by converting it to another function type, and then simply passing a HttpRequest build ad hoc, receiving a Future containing the desired response. The conversion required a little more time than the rest of the operations, but it's something that could be done while bootrstrapping the application.
Note: the conversion requires these imports, as stated here: https://doc.akka.io/docs/akka-http/current/introduction.html
implicit val system = ActorSystem("my-system")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
But these imports are already mandatory for the create of the service itself.
If this is for unit tests, you can use akka-http's test kit.
If this is for the application itself, you should not go through the route, you should just invoke the relevant services that the controller would use directly. If that is inconvenient (too much copy-pasta), refactor until it becomes possible.
As for the reason, I want my application to be wrapped inside both a web server (then use the route the “normal” way) and a daemon that responds to a message broker inbound message.
I have an application that does something like that actually.
But I came at this from the other way: I consider the broker message to be the "primary" format. It is "routed" inside of the consumer based purely on properties of the message itself (body contents, message key, topic name). The HTTP gateway is built on top of that: It has only a very limited number of API endpoints and routes (mostly for caller convenience, might as well have just a single one) and constructs a message that it then passes off to the message consumer (in my case, via the broker actually, so that the HTTP gateway does not even have to be on the same host as the consumer).
As a result, I don't have to "re-use" the HTTP route because that does not really do anything. All the shared processing logic happens at the lower level (inside the service, inside the consumer).

What is the best way to combine akka-http flow in a scala-stream flow

I have an use case where after n flows of Akka-stream, I have to take the result of one of them and made a request to a HTTP REST API.
The last akka-stream flow type, before the HTTP request is a string:
val stream1:Flow[T,String,NotUsed] = Flow[T].map(_.toString)
Now, HTTP request should be specified, I thought about something like:
val stream2: Flow[String,Future[HttpRespone],NotUsed] = Flow[String].map(param => Http.singleRequest(HttpRequest(uri=s"host.com/$param")))
and then combine it:
val stream3 = stream1 via stream2
Is it the best way to do it? Which ways you guys would actually recommend and why? A couple of best praxis examples in the scope of this use case would be great!
Thanks in advance :)
Your implementation would create a new connection to "host.com" for each new param. This is unnecessary and prevents akka from making certain optimizations. Under the hood akka actually keeps a connection pool around to reuse open connections but I think it is better to specify your intentions in the code and not rely on the underlying implementation.
You can make a single connection as described in the documentation:
val connectionFlow: Flow[HttpRequest, HttpResponse, _] =
Http().outgoingConnection("host.com")
To utilize this connection Flow you'll need to convert your String paths to HttpRequest objects:
import akka.http.scaladsl.model.Uri
import akka.http.scaladsl.model.Uri.Path
def pathToRequest(path : String) = HttpRequest(uri=Uri.Empty.withPath(Path(path)))
val reqFlow = Flow[String] map pathToRequest
And, finally, glue all the flows together:
val stream3 = stream1 via reqFlow via connectionFlow
This is the most common pattern for continuously querying the same server with different request objects.

Spray/Scala - Setting timeout on specific request

I currently have a REST call set up using spray pipeline. If I don't get a response within x number of seconds, I want it to timeout, but only on that particular call. When making a spray client pipeline request, is there a good way to specify a timeout specific to that particular call?
As far as I can tell, as of spray-client 1.3.1 there is no way to customise the pipe after it has been created.
However, you can create custom pipes for different types of requests.
It's worth mentioning the fact that the timeouts defined below are the timeouts for the ask() calls, not for the network operations, but I guess this is what you need from your description.
I found the following article very useful in understanding a bit better how the library works behind the scenes: http://kamon.io/teamblog/2014/11/02/understanding-spray-client-timeout-settings/
Disclaimer: I haven't actually tried this, but I guess it should work:
val timeout1 = Timeout(5 minutes)
val timeout2 = Timeout(1 minutes)
val pipeline1: HttpRequest => Future[HttpResponse] = sendReceive(implicitly[ActorRefFactory],
implicitly[ExecutionContext], timeout1)
val pipeline2: HttpRequest => Future[HttpResponse] = sendReceive(implicitly[ActorRefFactory],
implicitly[ExecutionContext], timeout2)
and then you obviously use the appropriate pipe for each request

looking for a http client in scala that handles redirects

I am looking for a http client in scala, that handles redirects. How do I fetch the content of a Url in scala, handling redirects?
I saw the scala.io.Source examples, but they dont handle redirects.
If you don't want to use something like HttpClient (which is probably better for anything beyond toy examples), you can tinker with the URLConnection:
def urlToStream(url: String) = Source.fromInputStream(
(new java.net.URL(url).openConnection match {
case connection: java.net.HttpURLConnection => {
connection.setInstanceFollowRedirects(true)
connection
}
case connection => connection
}).getInputStream
)
This will turn on redirect following if the protocol is HTTP.
You can use Finagle to build a client. It is pretty low-level though, working directly at the HttpRequest => Future[HttpResponse] level, so it requires a small amount of work to get it to handle a redirect.
Did you check out Dispatch? http://dispatch.databinder.net/Dispatch.html
It wraps HttpClient, so you can do anything HttpClient can do, but in a Scala way. IMO, it's a bit heavy on weird operators, and should spell more things out, but I have been using it for a year or two and like many things about it.