PlayWS calculate the size of a http call without consuming the stream - scala

I'm currently using the PlayWS http client which returns an Akka stream. From my understanding, I can consume the stream and turn it into a Byte[] to calculate the size. However, this also consumes the stream and I can't use it anymore. Anyway around this?

I think there are two different aspects related to the question.
You want to know the size of the server response in advance to prepare buffer. Unfortunately there is no guaranteed way to do this. HTTP 1.1 spec explicitly allows transfer mode when the server does not know the size of the response in advance via chunked transfer encoding. See also quote from 3.3.1. Transfer-Encoding:
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance.
Section 3.3.3. Message Body Length specifies how length of a message body is defined and it besides the aforementioned chunked transfer encoding it also contains quite unhelpful
Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.
This is added for backward compatibility and discouraged from usage but is still legally allowed.
Still in many real world scenarios you can use Content-Length header field that the server may return. However there is a catch here as well: if gzip Content-Encoding is used, then Content-Length will contain size of the compressed body.
To sum up: in general case you can't get the size of the message body in advance before you fully get the server response i.e. in terms of code perform a blocking call on the response. You may try to use Content-Length and it might or might not help in your specific case.
You already have a fully downloaded response (or you are OK with blocking on your StreamedResponse) and you want to process it by first getting the size and only then processing the actual data. In such case you may first use getBodyAsBytes method which returns IndexedSeq[Byte] and thus has size, and then convert it into a new Source using Source.single which is actually exactly what the default (i.e. non-streaming) implementation of getBodyAsSource does.

Related

I need simple proxy between 2 rest APIs

My code is working ok for GET/POST/PUT to/from restApi1 and restApi2.
However, my problem I need to implement HEAD/OPTIONS (no body!) and GET uri1
HEAD/OPTIONS could return 204 or 200 depends on a process status. I am getting error "Stream closed". Sounds like Camel want body bytes, but I don't indend to have it. Even I set ExchangePattern.InOnly or optional etc error occur...
What is correct way to see responses and handle requests WITHOUT body, just statuses exchange?
How to see response from restApi2 on Camel rest("/restApi1").head().route().routeId("id1")
.to("direct:restApi2").routeId("/id1").setHeader(Exchange.HTTP_METHOD,constant("HEAD"))
setExchanggePattern(ExchangePattern.OutOptionalIn).recepientList(simple(restApi2));
I figured it out. Need to set '.convertBodyTo(String.class)' even I don't have a body.

Play! Framework 2.6! Gzip Filter if the response size is greater than 50bytes

I am currently working with Play! Framework 2.6. I am looking into gzipping my response if they are greater than 80bytes. However, With the Framework there is no way to perform this. Based on this Documentation I can make use of the ff code snippet
new GzipFilter(shouldGzip = (request, response) =>
response.body.contentType.exists(_.startsWith("text/html")))
However it did not specify on where would I create this. Any idea how I can specify if it should a gzip a certain response if its greater than 50bytes?
By default, the response bodies are streamed which means you do not know how big the size of the response body will be.
If you know the size of the response body already (e.g. you're serving a file from Amazon S3 already know the file size) You can set the Content-Length header and check it in GzipFilter.
You will also likely need to implement your own GzipFilter and adapt it so it checks the Content-Length.

How to handle error responses in a REST endpoint that accepts different Accept header values.

I'm trying to add a new content type to a REST endpoint. Currently it only returns json but I now need to be able to return also a CSV file.
As far as I know, the best way to do this is by using the Accept header with value text/csv and then add a converter that is able to react to this and convert the returned body to the proper CSV representation.
I've been able to do this but then I have a problem handling exceptions. Up until know, all the errors returned are in json. The frontend expects any 500 status code to contain a specific body with the error. But now, by adding the option to return either application/json or text/csv to my endpoint, in case of an error, the converter to be used to transform the body is going to be either the jackson converter or my custom one depending on the Accept header passed. Moreover, my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected. Doing it this way, I'd be able to change the content-type of the response and the parsing of the data directly in the controller as the GET request won't include any Accept header and it will be able to accept anything. There are some parts of the code already doing this where the only expected response format is CSV so I'm going to have a difficult time defending the use of the Accept header unless there is a better way of handling this.
my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
Yes.
For example, RFC 7807 describes a common format for describing problems. So the server would send an application/problem+json or an application/problem+xml representation of the issue in the response, along with the usual meta data in the headers.
Consumers that understand application/problem+json can parse the data with in, and forward a useful description of the problem to the user/logs whatever. Consumers that don't understand that representation are limited to acting on the information in the headers.
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected.
That's also fine -- more precisely, you can have a different resource responsible for the each of the different media-types that you support.
It may be useful to review section 3.4 of RFC 7231, which describes the semantics of content negotiation.

Ensure Completeness of HTTP Messages

I am currently working on an application that is supposed to get a web page and extract information from its content.
As I learned from my research (or as it seems to me at least), there is no ideal way to determine the end of an HTTP message.
Generally, I found two different ways to do so:
Set O_NONBLOCK flag for the socket and fetch data with recv() in a while loop. Assume that the message is complete and break if it occurs once that there are no bytes in the stream.
Rely on the HTTP Content-Length header and determine the end of the message with it.
Both ways don't seem to be completely safe to me. Solution (1) could possibly break the recv loop before the message was completed. On the other hand, solution (2) requires the Content-Length header to be set correctly.
What's the best way to proceed in this case? Can I always rely on the Content-Length header to be set?
Let me start here:
Can I always rely on the Content-Length header to be set?
No, you can't. Content-Length is an optional header. However, HTTP messages absolutely must feature a way to determine their body length if they are to be RFC-compliant (cf RFC7230, sec. 3.3.3). That being said, get ready to parse on chunked encoding whenever a content length isn't specified.
As for your original problem: Ensuring the completeness of a message is actually something that should be TCP's job. But as there are such complicated things like message pipelining around, it is best to check for two things in practice:
Have all reads from the network buffer been successful?
Is the number of the received bytes identical to the predicted message length?
Oh, and as #MartinJames noted, non-blocking probably isn't the best idea here.
The end of a HTTP response is defined:
By the final (empty) chunk in case Transfer-Encoding chunked is used.
By reaching the given length if a Content-length header is given and no chunked transfer encoding is used.
By the end of the TCP connection if neither chunked transfer encoding is used not Content-length is given.
In the first two cases you have a well defined end so you can verify that the data were fully received. Only in the last case (end of TCP connection) you don't know if the connection was closed before sending all the data. But usually you get either case 1 or case 2.
To make your life easier, you might want to provide
Connection: close
header when making HTTP request - than web-server will close connection after giving you the full page requested and you will not have to deal with chunks.
It is only a viable option if you only are interested in this single page, and will not request additional resources (script files, images, etc) - in latter case this will be a very inefficient solution for both your app and the server.

How is determining body length by closing connection reliable (RFC 2616 4.4.5)

I can't get one thing straight. The RFC 2616 in 4.4.5 states that Message Length can be determined "By the server closing the connection.".
This implies, that it is valid for a server to respond (e.g. returning a large image) with a response, that has no Content-Length in the header, but the client is supposed to keep fetching till the connection is closed and then assume all data has been downloaded.
But how is a client to know for sure that the connection was closed intentionally by the server? A server app could have crashed in the middle of sending the data and the server's OS would most likely send FIN packet to gracefully close the TCP connection with the client.
You are absolutely right, that mechanism is totally unreliable. This is covered in RFC 7230:
Since there is no way to distinguish a successfully completed,
close-delimited message from a partially received message interrupted
by network failure, a server SHOULD generate encoding or
length-delimited messages whenever possible. The close-delimiting
feature exists primarily for backwards compatibility with HTTP/1.0.
Fortunately most of HTTP traffic today are HTTP/1.1, with Content-Length or "Transfer-Encoding" to explicitly define the end of message.
The lesson is that, a message must have it own way of termination; we cannot repurpose the underlying transport layer's EOF as the message's EOF.
On that note, a (well-formed) html document, or a .gif, .avi etc, does define its own termination; we will know if we received an incomplete document. Therefore it is not so much of a problem to transmit it over HTTP/1.0 without Content-Length.
However, for plain text document, javascript, css etc. EOF is used to marked the end of the document, therefore it's problematic over HTTP/1.0.