How to define the base64 content encoding in a REST API? - rest

I would like to do a REST resource that could be represented as:
PDF (binary for download);
JSON (with structured data);
PDF (base64 encode)
My expectation is to create a URL like this:
http://api.mybank.com/accounts/{accountId}/statement?fromDate=2019-11
The Accept header as 'application/json' will return the statement information.
The Accept header as 'application/pdf' will return the statement in a PDF binary file.
And PDF encoded as base64? How I define this?
Content-Transfer-Encoding is not an HTTP, but email header.
Transfer-Encoding is used only for compression.

Related

Headers for REST API with optional Base64 encoding

We have a media file repository, with which other services communicate over a REST API. For various reasons we want the users of the repository to be able to upload and download files over HTTP both directly (plaintext for text files and byte array for binary files) and using Base64 encoding. We want the fact that the file is uploaded (PUT, POST) and requested for download (GET) in the Base64 encoding be reflected in the header of the HTTP request.
How do we reflect the fact that the content of the request or requested response is Base64 encoded in the HTTP header?
So far I'm tending towards appending ;base64 after the mime type in the Content-Type header, for example Content-Type: image/png;base64. Other options (X- header, Content-Encoding) are discussed in this related question but do not offer satisfactory resolution to our question.
You have to use Content-Transfer-Encoding header.
It is in RFC https://www.rfc-editor.org/rfc/rfc2045#page-14.
It supports base64 value among others, like "7bit" / "8bit" / "binary" / "quoted-printable" / "base64" / ietf-token / x-token
This header is specially designed for your case, to use as a complement for MIME type.

How does an email client read the content-type headers for encoding?

It is possible to send an email with different content types: text/html, text/plain, mime, etc. It also is possible to use different encodings, including (according to the RFCs) for header fields: us-ascii, utf8, etc.
How do you solve the chicken and egg problem? The content-type header is just one of several headers. If the headers can be any encoding, how does a mail server or client know how to read the content-type header if it does not know what encoding the headers themselves are in?
I can see it if the first line, e.g. had to be the content-type and it had to be in a pre-agreed encoding, (e.g. ascii), but that is not the case.
How do you parse a stream of bytes whose encoding is embedded as a string inside that very same stream?
Headers are defined to be in ascii. They can be in utf-8 if agreed to out of band, such as via the smtp or imap utf-8 capability extensions.
Internationalization in headers is performed via "encoded words", where the encoding is part of the header data. (This looks like a string such as =?iso8859-1?q?sample_header_data?=). See rfc2047.
Content Type headers do not apply to headers themselves, only the body content.

Need to find the requests equivalent of openurl() from urllib2

I am currently trying to modify a script to use the requests library instead of the urllib2 library. I haven't really used it before and I am looking to do the equivalent of urlopen("http://www.example.org").read(), so I tried the requests.get("http://www.example.org").text function.
This works fine with normal everyday html, however when I fetch from this url (https://gtfsrt.api.translink.com.au/Feed/SEQ) it doesn't seem to work.
So I wrote the below code to print out the responses from the same url using both the requests and urllib2 libraries.
import urllib2
import requests
#urllib2 request
request = urllib2.Request("https://gtfsrt.api.translink.com.au/Feed/SEQ")
result = urllib2.urlopen(request)
#requests request
result2 = requests.get("https://gtfsrt.api.translink.com.au/Feed/SEQ")
print result2.encoding
#urllib2 write to text
open("Output.txt", 'w').close()
text_file = open("Output.txt", "w")
text_file.write(result.read())
text_file.close()
open("Output2.txt", 'w').close()
text_file = open("Output2.txt", "w")
text_file.write(result2.text)
text_file.close()
The openurl().read() works fine but the requests.get().text doesn't work for the given this url. I suspect it has something to do with encoding, but i don't know what. Any thoughts?
Note: The supplied url is a feed in the google protocol buffer format, once I receive the message i give the feed to a google library that interprets it.
Your issue is that you're making the requests module interpret binary content in a response as text.
A response from the requests library has two main way to access the body of the response:
Response.content - will return the response body as a bytestring
Response.text - will decode the response body as text and return unicode
Since protocol buffers are a binary format, you should use result2.content in your code instead of result2.text.
Response.content will return the body of the response as-is, in bytes. For binary content this is exactly what you want. For text content that contains non-ASCII characters this means the content must have been encoded by the server into a bytestring using a particular encoding that is indicated by either a HTTP header or a <meta charset="..." /> tag. In order to make sense of those bytes they therefore need to be decoded after receiving using that charset.
Response.text now is a convenience method that does exactly this for you. It assumes the response body is text, and looks at the response headers to find the encoding, and decodes it for you, returning unicode.
But if your response doesn't contain text, this is the wrong method to use. Binary content doesn't contain characters, because it's not text, so the whole concept of character encoding does not make any sense for binary content - it's only applicable to text composed of characters. (That's also why you're seeing response.encoding == None - it's just bytes, there is no character encoding involved).
See Response Content and Binary Response Content in the requests documentation for more details.

Postman Chrome: What is the difference between form-data, x-www-form-urlencoded and raw

I am using the Postman Chrome extension for testing a web service.
There are three options available for data input.
I guess the raw is for sending JSON.
What is the difference between the other two, form-data and x-www-form-urlencoded?
These are different Form content types defined by W3C.
If you want to send simple text/ ASCII data, then x-www-form-urlencoded will work. This is the default.
But if you have to send non-ASCII text or large binary data, the form-data is for that.
You can use Raw if you want to send plain text or JSON or any other kind of string. Like the name suggests, Postman sends your raw string data as it is without modifications. The type of data that you are sending can be set by using the content-type header from the drop down.
Binary can be used when you want to attach non-textual data to the request, e.g. a video/audio file, images, or any other binary data file.
Refer to this link for further reading:
Forms in HTML documents
This explains better:
Postman docs
Request body
While constructing requests, you would be dealing with the request body editor a lot. Postman lets you send almost any kind of HTTP request (If you can't send something, let us know!). The body editor is divided into 4 areas and has different controls depending on the body type.
form-data
multipart/form-data is the default encoding a web form uses to transfer data. This simulates filling a form on a website, and submitting it. The form-data editor lets you set key/value pairs (using the key-value editor) for your data. You can attach files to a key as well. Do note that due to restrictions of the HTML5 spec, files are not stored in history or collections. You would have to select the file again at the time of sending a request.
urlencoded
This encoding is the same as the one used in URL parameters. You just need to enter key/value pairs and Postman will encode the keys and values properly. Note that you can not upload files through this encoding mode. There might be some confusion between form-data and urlencoded so make sure to check with your API first.
raw
A raw request can contain anything. Postman doesn't touch the string entered in the raw editor except replacing environment variables. Whatever you put in the text area gets sent with the request. The raw editor lets you set the formatting type along with the correct header that you should send with the raw body. You can set the Content-Type header manually as well. Normally, you would be sending XML or JSON data here.
binary
binary data allows you to send things which you can not enter in Postman. For example, image, audio or video files. You can send text files as well. As mentioned earlier in the form-data section, you would have to reattach a file if you are loading a request through the history or the collection.
UPDATE
As pointed out by VKK, the WHATWG spec say urlencoded is the default encoding type for forms.
The invalid value default for these attributes is the application/x-www-form-urlencoded state. The missing value default for the enctype attribute is also the application/x-www-form-urlencoded state.
Here are some supplemental examples to see the raw text that Postman passes in the request. You can see this by opening the Postman console:
form-data
Header
content-type: multipart/form-data; boundary=--------------------------590299136414163472038474
Body
key1=value1key2=value2
x-www-form-urlencoded
Header
Content-Type: application/x-www-form-urlencoded
Body
key1=value1&key2=value2
Raw text/plain
Header
Content-Type: text/plain
Body
This is some text.
Raw json
Header
Content-Type: application/json
Body
{"key1":"value1","key2":"value2"}
multipart/form-data
Note. Please consult RFC2388 for additional information about file uploads, including backwards compatibility issues, the relationship between "multipart/form-data" and other content types, performance issues, etc.
Please consult the appendix for information about security issues for forms.
The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.
The content type "multipart/form-data" follows the rules of all multipart MIME data streams as outlined in RFC2045. The definition of "multipart/form-data" is available at the [IANA] registry.
A "multipart/form-data" message contains a series of parts, each representing a successful control. The parts are sent to the processing agent in the same order the corresponding controls appear in the document stream. Part boundaries should not occur in any of the data; how this is done lies outside the scope of this specification.
As with all multipart MIME types, each part has an optional "Content-Type" header that defaults to "text/plain". User agents should supply the "Content-Type" header, accompanied by a "charset" parameter.
application/x-www-form-urlencoded
This is the default content type. Forms submitted with this content type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by +', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by %HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., %0D%0A'). The control names/values are listed in the order they appear in the document. The name is separated from the value by =' and name/value pairs are separated from each other by `&'.
application/x-www-form-urlencoded the body of the HTTP message sent to the server is essentially one giant query string -- name/value pairs are separated by the ampersand (&), and names are separated from values by the equals symbol (=). An example of this would be:
MyVariableOne=ValueOne&MyVariableTwo=ValueTwo
The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.
let's take everything easy, it's all about how a http request is made:
1- x-www-form-urlencoded
http request:
GET /getParam1 HTTP/1.1
User-Agent: PostmanRuntime/7.28.4
Accept: */*
Postman-Token: a14f1286-52ae-4871-919d-887b0e273052
Host: localhost:12345
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 55
postParam1Key=postParam1Val&postParam2Key=postParam2Val
2- raw
http request:
GET /getParam1 HTTP/1.1
Content-Type: text/plain
User-Agent: PostmanRuntime/7.28.4
Accept: */*
Postman-Token: e3f7514b-3f87-4354-bcb1-cee67c306fef
Host: localhost:12345
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Length: 73
{
postParam1Key: postParam1Val,
postParam2Key: postParam2Val
}
3- form-data
http request:
GET /getParam1 HTTP/1.1
User-Agent: PostmanRuntime/7.28.4
Accept: */*
Postman-Token: 8e2ce54b-d697-4179-b599-99e20271df90
Host: localhost:12345
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Content-Type: multipart/form-data; boundary=--------------------------140760168634293019785817
Content-Length: 181
----------------------------140760168634293019785817
Content-Disposition: form-data; name="postParam1Key"
postParam1Val
----------------------------140760168634293019785817--

How to decode response in AnyEvent::HTTP?

I'm writing a simple script using AnyEvent::HTTP which fetches several HTML-pages in parallel. The module doesn't do any content decoding, it just provides the response body and headers to the callback.
What would be the proper way to decode the response into Perl's Unicode strings? Should I rely on the "Content-Type" header field or meta tags?