Play! Framework 2.6! Gzip Filter if the response size is greater than 50bytes - scala

I am currently working with Play! Framework 2.6. I am looking into gzipping my response if they are greater than 80bytes. However, With the Framework there is no way to perform this. Based on this Documentation I can make use of the ff code snippet
new GzipFilter(shouldGzip = (request, response) =>
response.body.contentType.exists(_.startsWith("text/html")))
However it did not specify on where would I create this. Any idea how I can specify if it should a gzip a certain response if its greater than 50bytes?

By default, the response bodies are streamed which means you do not know how big the size of the response body will be.
If you know the size of the response body already (e.g. you're serving a file from Amazon S3 already know the file size) You can set the Content-Length header and check it in GzipFilter.
You will also likely need to implement your own GzipFilter and adapt it so it checks the Content-Length.

Related

How do you get a bytes object from Google Cloud Storage Bucket

My question at Github
https://github.com/googleapis/python-speech/issues/52
has been active for 9 days and the only two people to have attempted an answer have both failed but now I think it might be possible for someone to answer it who understands how Google Cloud Buckets work even though they do not understand how Google's Speech Api works. In order to convert long audio files to text they first must be uploaded to the Cloud. I was using some syntax that now appears to be broken and the following syntax might work except that Google does not explain how to use this code in coordination with files uploaded to the Cloud. So in the code below published here:
https://cloud.google.com/speech-to-text/docs/async-recognize#speech_transcribe_async-python
The content object has to be located on the cloud and it needs to be a bytes object. Suppose the address of the object is: gs://audio_files/cool_audio
What syntax would I use such that the content object refers to a bytes object?
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=90)
My previous answer didn't really address your question. Let me try again:
Please try this:
audio = types.RecognitionAudio(content=bytes(content, 'utf-8'))
GCS stores objects as a sequence of bytes. If your object has a Content-Encoding header that can cause the content to be transformed while downloading (e.g., gzip content will be uncompressed if the client doesn't supply an Accept-Encoding: gzip header); and if it has a Content-Type header the client application or library may treat the information differently.

How to handle error responses in a REST endpoint that accepts different Accept header values.

I'm trying to add a new content type to a REST endpoint. Currently it only returns json but I now need to be able to return also a CSV file.
As far as I know, the best way to do this is by using the Accept header with value text/csv and then add a converter that is able to react to this and convert the returned body to the proper CSV representation.
I've been able to do this but then I have a problem handling exceptions. Up until know, all the errors returned are in json. The frontend expects any 500 status code to contain a specific body with the error. But now, by adding the option to return either application/json or text/csv to my endpoint, in case of an error, the converter to be used to transform the body is going to be either the jackson converter or my custom one depending on the Accept header passed. Moreover, my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected. Doing it this way, I'd be able to change the content-type of the response and the parsing of the data directly in the controller as the GET request won't include any Accept header and it will be able to accept anything. There are some parts of the code already doing this where the only expected response format is CSV so I'm going to have a difficult time defending the use of the Accept header unless there is a better way of handling this.
my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
Yes.
For example, RFC 7807 describes a common format for describing problems. So the server would send an application/problem+json or an application/problem+xml representation of the issue in the response, along with the usual meta data in the headers.
Consumers that understand application/problem+json can parse the data with in, and forward a useful description of the problem to the user/logs whatever. Consumers that don't understand that representation are limited to acting on the information in the headers.
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected.
That's also fine -- more precisely, you can have a different resource responsible for the each of the different media-types that you support.
It may be useful to review section 3.4 of RFC 7231, which describes the semantics of content negotiation.

PlayWS calculate the size of a http call without consuming the stream

I'm currently using the PlayWS http client which returns an Akka stream. From my understanding, I can consume the stream and turn it into a Byte[] to calculate the size. However, this also consumes the stream and I can't use it anymore. Anyway around this?
I think there are two different aspects related to the question.
You want to know the size of the server response in advance to prepare buffer. Unfortunately there is no guaranteed way to do this. HTTP 1.1 spec explicitly allows transfer mode when the server does not know the size of the response in advance via chunked transfer encoding. See also quote from 3.3.1. Transfer-Encoding:
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance.
Section 3.3.3. Message Body Length specifies how length of a message body is defined and it besides the aforementioned chunked transfer encoding it also contains quite unhelpful
Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.
This is added for backward compatibility and discouraged from usage but is still legally allowed.
Still in many real world scenarios you can use Content-Length header field that the server may return. However there is a catch here as well: if gzip Content-Encoding is used, then Content-Length will contain size of the compressed body.
To sum up: in general case you can't get the size of the message body in advance before you fully get the server response i.e. in terms of code perform a blocking call on the response. You may try to use Content-Length and it might or might not help in your specific case.
You already have a fully downloaded response (or you are OK with blocking on your StreamedResponse) and you want to process it by first getting the size and only then processing the actual data. In such case you may first use getBodyAsBytes method which returns IndexedSeq[Byte] and thus has size, and then convert it into a new Source using Source.single which is actually exactly what the default (i.e. non-streaming) implementation of getBodyAsSource does.

Is it ok to return application/octet-stream from a REST interface?

Am I breaking any laws in the REST bible by returning application/octet-stream for my responses ? The REST endpoint receives 5 image urls.
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif" }
and it will download these and return them as application/octet-stream.
CLARIFICATION: The client that invokes this REST interface is a mobile app. Every additional network connections made will reduce battery life by a few milliamps. I am forced to use REST because it is a company standard. If not, I will do my own binary protocol.
It is not so good, as the client will not know what to do with such binary data except of storing those bytes somewhere or sending them further to some other process (if this is all you need to do with your data, then it is fine).
You may take a look at multipart content types. IMO, a multipart message containing several image/gif parts would be a better alternative.
From the sounds of this, this sounds much more like an RPC call. Specifically, "here's a list of URLs, send me back an archive".
That process is not particularly RESTful, as REST is not an RPC based system.
What you need to do is treat the archives as reources, and a way to create and then serve them up.
For example you could:
POST /archives
Content-Type: application/json
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif" }
As a result, you would get
HTTP/1.1 201 Created
Location: http://example.com/archives/1234
Content-Type: application/json
Then, you could make a request to http://example.com:
GET /archives/1234
Accept: multipart/mixed
Here, you will get the actual archive in a single request (like you want), only it's a multipart formatted result. (multipart/x-zip would work too, that's a zip file)
If you did:
GET /archives/1234
Accept: application/json
You would get back the JSON you sent originally (so you could, perhaps, edit and update the archive, something you may not want to support sending up the binary images).
To change it you would simply POST back the update:
PUT /archives/1234
Content-Type: application/json
{ "image1": "http://ww.o.com/1.gif",
"image2": "http://www.foo.be/2.gif",
"image3": "http://www.foo2.foo/4.gif" }
The resource is /archives/1234, that's its name.
It has two representations in this case: the JSON version, and the actual, binary archive. Your service distinguishes between the two using the content type specified in the Accept header. That header is the client telling you what it wants.
When you're done with the archive, simply DELETE it
DELETE /archives/1234
Or you can have the server expire the resource at some later time.
Why not have five separate REST calls?
Seems cleaner and divides more logically. It will also run the downloads in parallel, 2 or more at a time depending on the browser you are using.
They are called REST principles not laws, but no you are not "breaking" them, IMO. REST is about resources being addressable by a URL, and (where appropriate) available in multiple formats. It doesn't say what the format should be. There's a simple description of what REST means in this article.
However, as #Andrey says there are nicer ways to handle sending multiple data objects than inventing your own adhoc format. The Multipart mimeType / format is one alternative, and another is to send the objects packed up as a tar, zip or a similar archive file format.
IMO. the real problem with using "application/octet-stream" and is that it doesn't tell anyone anything about how the data is actually formatted. Rather your client has "know" how it is formatted, and interpret it accordingly. And the problems with inventing your own format are interoperability and (possibly) having to design, implement and maintain libraries to support it, possibly may times over.

Compressing HTTP request with LWP, Apache, and mod_deflate

I have a client/server system that performs communication using XML transferred using HTTP requests and responses with the client using Perl's LWP and the server running Perl's CGI.pm through Apache. In addition the stream is encrypted using SSL with certificates for both the server and all clients.
This system works well, except that periodically the client needs to send really large amounts of data. An obvious solution would be to compress the data on the client side, send it over, and decompress it on the server. Rather than implement this myself, I was hoping to use Apache's mod_deflate's "Input Decompression" as described here.
The description warns:
If you evaluate the request body yourself, don't trust the Content-Length header! The Content-Length header reflects the length of the incoming data from the client and not the byte count of the decompressed data stream.
So if I provide a Content-Length value which matches the compressed data size, the data is truncated. This is because mod_deflate decompresses the stream, but CGI.pm only reads to the Content-Length limit.
Alternatively, if I try to outsmart it and override the Content-Length header with the decompressed data size, LWP complains and resets the value to the compressed length, leaving me with the same problem.
Finally, I attempted to hack the part of LWP which does the correction. The original code is:
# Set (or override) Content-Length header
my $clen = $request_headers->header('Content-Length');
if (defined($$content_ref) && length($$content_ref)) {
$has_content = length($$content_ref);
if (!defined($clen) || $clen ne $has_content) {
if (defined $clen) {
warn "Content-Length header value was wrong, fixed";
hlist_remove(\#h, 'Content-Length');
}
push(#h, 'Content-Length' => $has_content);
}
}
elsif ($clen) {
warn "Content-Length set when there is no content, fixed";
hlist_remove(\#h, 'Content-Length');
}
And I changed the push line to:
push(#h, 'Content-Length' => $clen);
Unfortunately this causes some problem where content (truncated or not) doesn't even get to my CGI script.
Has anyone made this work? I found this which does compression on a file before uploading, but not compressing a generic request.
Although you said you didn't want to do the compression yourself, there are lots of perl modules which will do both sides for you, Compress::Zlib for example.
I have a cheat (with a .net part of the company) where I get passed XML as a separate parameter posted in, then can handle it as if it was a string rather than faffing about with SOAP like stuff.
I don't think you can change the Content-Length like that. It would confuse Apache, because mod_deflate wouldn't know how much compressed data to read. What about having the client add an X-Uncompressed-Length header, and then use a modified version of CGI.pm that uses X-Uncompressed-Length (if present) instead of Content-Length? (Actually, you probably don't need to modify CGI.pm. Just set $ENV{'CONTENT_LENGTH'} to the appropriate value before initializing the CGI object or calling any CGI functions.)
Or, use a lower-level module that uses the bucket brigade to tell how much data to read.
I am not sure if I am following you with what you want, but I have a custom get/post module, that I use to do some non standard stuff. The below code will read in anything sent via post, or STDIN.
read(STDIN, $query_string, $ENV{'CONTENT_LENGTH'});
Instead of using using $ENV's value use your's. I hope this helps, and sorry if it doesn't.