How to decode response in AnyEvent::HTTP? - perl

I'm writing a simple script using AnyEvent::HTTP which fetches several HTML-pages in parallel. The module doesn't do any content decoding, it just provides the response body and headers to the callback.
What would be the proper way to decode the response into Perl's Unicode strings? Should I rely on the "Content-Type" header field or meta tags?

Related

HTTPie returning "Error processing request. All request parts must have the content-type header set."

I'm testing an API with HTTPie. The implementation notes of the method I'm trying to use states that it accepts a multipart query containing a model in JSON format (Content-Type=application/json) and one or several files (Content-Type=application/octet-stream). I'm trying to post a file accompanied by a model in JSON. According to what I understood from the HTPPie documentation the way to do it is passing it as form:
http --form POST https://smartcat.ai/api/integration/v1/project/document documentModel#/path/to/json/file taskfile#/path/to/file\ projectId==xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx
where projectId is a parameter to be sent as query string.
I've tried to set a Content-Type header for each file, however it doesn't seem right, as I expected the --form flag to set content-type as multipart/form-data, according to the documentation.
I'm sure I'm missing something basic, so any ideas on the direction to follow and how to better understand mime types are very welcome.

How to handle error responses in a REST endpoint that accepts different Accept header values.

I'm trying to add a new content type to a REST endpoint. Currently it only returns json but I now need to be able to return also a CSV file.
As far as I know, the best way to do this is by using the Accept header with value text/csv and then add a converter that is able to react to this and convert the returned body to the proper CSV representation.
I've been able to do this but then I have a problem handling exceptions. Up until know, all the errors returned are in json. The frontend expects any 500 status code to contain a specific body with the error. But now, by adding the option to return either application/json or text/csv to my endpoint, in case of an error, the converter to be used to transform the body is going to be either the jackson converter or my custom one depending on the Accept header passed. Moreover, my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected. Doing it this way, I'd be able to change the content-type of the response and the parsing of the data directly in the controller as the GET request won't include any Accept header and it will be able to accept anything. There are some parts of the code already doing this where the only expected response format is CSV so I'm going to have a difficult time defending the use of the Accept header unless there is a better way of handling this.
my frontend is going to need to read the content-type returned and parse the value based on the type of representation returned.
Is this the normal approach to handle this situation?
Yes.
For example, RFC 7807 describes a common format for describing problems. So the server would send an application/problem+json or an application/problem+xml representation of the issue in the response, along with the usual meta data in the headers.
Consumers that understand application/problem+json can parse the data with in, and forward a useful description of the problem to the user/logs whatever. Consumers that don't understand that representation are limited to acting on the information in the headers.
A faster workaround would be to forget about the Accept header and include a url parameter indicating the format expected.
That's also fine -- more precisely, you can have a different resource responsible for the each of the different media-types that you support.
It may be useful to review section 3.4 of RFC 7231, which describes the semantics of content negotiation.

How does an email client read the content-type headers for encoding?

It is possible to send an email with different content types: text/html, text/plain, mime, etc. It also is possible to use different encodings, including (according to the RFCs) for header fields: us-ascii, utf8, etc.
How do you solve the chicken and egg problem? The content-type header is just one of several headers. If the headers can be any encoding, how does a mail server or client know how to read the content-type header if it does not know what encoding the headers themselves are in?
I can see it if the first line, e.g. had to be the content-type and it had to be in a pre-agreed encoding, (e.g. ascii), but that is not the case.
How do you parse a stream of bytes whose encoding is embedded as a string inside that very same stream?
Headers are defined to be in ascii. They can be in utf-8 if agreed to out of band, such as via the smtp or imap utf-8 capability extensions.
Internationalization in headers is performed via "encoded words", where the encoding is part of the header data. (This looks like a string such as =?iso8859-1?q?sample_header_data?=). See rfc2047.
Content Type headers do not apply to headers themselves, only the body content.

Need to find the requests equivalent of openurl() from urllib2

I am currently trying to modify a script to use the requests library instead of the urllib2 library. I haven't really used it before and I am looking to do the equivalent of urlopen("http://www.example.org").read(), so I tried the requests.get("http://www.example.org").text function.
This works fine with normal everyday html, however when I fetch from this url (https://gtfsrt.api.translink.com.au/Feed/SEQ) it doesn't seem to work.
So I wrote the below code to print out the responses from the same url using both the requests and urllib2 libraries.
import urllib2
import requests
#urllib2 request
request = urllib2.Request("https://gtfsrt.api.translink.com.au/Feed/SEQ")
result = urllib2.urlopen(request)
#requests request
result2 = requests.get("https://gtfsrt.api.translink.com.au/Feed/SEQ")
print result2.encoding
#urllib2 write to text
open("Output.txt", 'w').close()
text_file = open("Output.txt", "w")
text_file.write(result.read())
text_file.close()
open("Output2.txt", 'w').close()
text_file = open("Output2.txt", "w")
text_file.write(result2.text)
text_file.close()
The openurl().read() works fine but the requests.get().text doesn't work for the given this url. I suspect it has something to do with encoding, but i don't know what. Any thoughts?
Note: The supplied url is a feed in the google protocol buffer format, once I receive the message i give the feed to a google library that interprets it.
Your issue is that you're making the requests module interpret binary content in a response as text.
A response from the requests library has two main way to access the body of the response:
Response.content - will return the response body as a bytestring
Response.text - will decode the response body as text and return unicode
Since protocol buffers are a binary format, you should use result2.content in your code instead of result2.text.
Response.content will return the body of the response as-is, in bytes. For binary content this is exactly what you want. For text content that contains non-ASCII characters this means the content must have been encoded by the server into a bytestring using a particular encoding that is indicated by either a HTTP header or a <meta charset="..." /> tag. In order to make sense of those bytes they therefore need to be decoded after receiving using that charset.
Response.text now is a convenience method that does exactly this for you. It assumes the response body is text, and looks at the response headers to find the encoding, and decodes it for you, returning unicode.
But if your response doesn't contain text, this is the wrong method to use. Binary content doesn't contain characters, because it's not text, so the whole concept of character encoding does not make any sense for binary content - it's only applicable to text composed of characters. (That's also why you're seeing response.encoding == None - it's just bytes, there is no character encoding involved).
See Response Content and Binary Response Content in the requests documentation for more details.

How to deal with a media-type for error messages of the Cowboy REST handler

I want the user to have the opportunity to choose the format in which it receives a response from the server, whether it is plain text, json or xml. It's looks like I must to retrieve media_type by calling cowboy_req:meta/{2,3} and then use it for encoding a response body. But that value doesn't available in callbacks before content_types_provided (malformed_request, is_authorized, forbidden...).
Should I dublicate a cowboy logic and write my own code to determine media_type?
Or ignore all callbacks which had executed before the media_type has been determined.
Or maybe should I to place my response message into request metadata and encode it in the onresponse hook, then replace response body?
How should I do that?
I think you are not quite wright. Straight from init/3 and rest_init/3 functions the Request parameter is "full request", and you can read any header or meta in each callback.
And personally i would go with Header over Meta (since there is already Content-Type header defined, and Headers should take presence over Meta).
In general the REST callbacks in cowboy should only give you easily understandable workflow for handling request, with additional default response codes. In is_authorized/2 all you need to do is check authorization, simply return true or false (as part of tuple), and cowboy will either move forward with you logic or return 401 code. Checking is someone allowed to make request should not be determined on response format, but still, if you would like to do it, just read this Meta from Req parameter, and return true/false based on it.
And the only difference with content_types_provided/2 is that you return kind of bindings between Content-Types header values and your functions. I think all you need could be based on this official example