Should I encode HTTP request headers? - rest

I know that URLs need to be encoded before sent to server. But do I need to encode HTTP request headers as well? I'm sending a base64 string in a header that contains = as %3d .. Is that correct?

The field value restrictions are similar, but not exactly the same. For instance, you can't use bare "%", or "<" and ">" in a URI, but you can use them in a header field value.

Related

Why do you use base64 URL encoding with JSON web tokens?

The Scenario:
I'm reading about JSON web tokens at this link (https://medium.com/vandium-software/5-easy-steps-to-understanding-json-web-tokens-jwt-1164c0adfcec). It outline how to create a JSON web token, you create a header and a payload, and then create a signature using the following pseudocode:
data = base64urlEncode( header ) + “.” + base64urlEncode( payload )
hashedData = hash( data, secret )
signature = base64urlEncode( hashedData )
My Question:
Why does the pseudocode use base64urlEncode when creating data and signature?
Scope Of What I Understand So Far:
Base64 allows you to express binary data using text characters from the Base64 set of 64 text characters. This is usually used when you have a set of data that you want to pass through some channel that might misinterpret some of the characters, but would not misinterpret Base64 characters, so you encode it using Base64 so that the data won't get misinterpreted. Base64 URL encoding, on the other hand, is analogous to Base64 encoding except that you use only a subset of the Base64 character set that does not include characters that have special meaning in URLs, so that if you use the Base64 URL encoded string in a URL, its meaning won't get misinterpreted.
Assuming my understanding there is correct, I'm trying to understand why base64urlEncode() is used in computing data and signature in the pseudocode above. Is the signature of a JSON web token going to be used somewhere in a URL? If so, why is data base64urlEncoded as well before hashing. Why not just encode the signature? Is there something about the hash function that would require its data parameter to be Base64 URL encoded?
When using the OAuth Implicit Grant, JWTs may be transferred as part of URL fragments.
That is just an example, but I guess in general it was presumed that JWTs might be passed through URLs, so base64urlEncodeing them makes sense.
The first line of the IETF JWT standard abstract even says:
JSON Web Token (JWT) is a compact, URL-safe means of representing claims to be transferred between two parties.
(Note that the OAuth Implicit Grant is no longer recommended to be used.)

How does an email client read the content-type headers for encoding?

It is possible to send an email with different content types: text/html, text/plain, mime, etc. It also is possible to use different encodings, including (according to the RFCs) for header fields: us-ascii, utf8, etc.
How do you solve the chicken and egg problem? The content-type header is just one of several headers. If the headers can be any encoding, how does a mail server or client know how to read the content-type header if it does not know what encoding the headers themselves are in?
I can see it if the first line, e.g. had to be the content-type and it had to be in a pre-agreed encoding, (e.g. ascii), but that is not the case.
How do you parse a stream of bytes whose encoding is embedded as a string inside that very same stream?
Headers are defined to be in ascii. They can be in utf-8 if agreed to out of band, such as via the smtp or imap utf-8 capability extensions.
Internationalization in headers is performed via "encoded words", where the encoding is part of the header data. (This looks like a string such as =?iso8859-1?q?sample_header_data?=). See rfc2047.
Content Type headers do not apply to headers themselves, only the body content.

Need to find the requests equivalent of openurl() from urllib2

I am currently trying to modify a script to use the requests library instead of the urllib2 library. I haven't really used it before and I am looking to do the equivalent of urlopen("http://www.example.org").read(), so I tried the requests.get("http://www.example.org").text function.
This works fine with normal everyday html, however when I fetch from this url (https://gtfsrt.api.translink.com.au/Feed/SEQ) it doesn't seem to work.
So I wrote the below code to print out the responses from the same url using both the requests and urllib2 libraries.
import urllib2
import requests
#urllib2 request
request = urllib2.Request("https://gtfsrt.api.translink.com.au/Feed/SEQ")
result = urllib2.urlopen(request)
#requests request
result2 = requests.get("https://gtfsrt.api.translink.com.au/Feed/SEQ")
print result2.encoding
#urllib2 write to text
open("Output.txt", 'w').close()
text_file = open("Output.txt", "w")
text_file.write(result.read())
text_file.close()
open("Output2.txt", 'w').close()
text_file = open("Output2.txt", "w")
text_file.write(result2.text)
text_file.close()
The openurl().read() works fine but the requests.get().text doesn't work for the given this url. I suspect it has something to do with encoding, but i don't know what. Any thoughts?
Note: The supplied url is a feed in the google protocol buffer format, once I receive the message i give the feed to a google library that interprets it.
Your issue is that you're making the requests module interpret binary content in a response as text.
A response from the requests library has two main way to access the body of the response:
Response.content - will return the response body as a bytestring
Response.text - will decode the response body as text and return unicode
Since protocol buffers are a binary format, you should use result2.content in your code instead of result2.text.
Response.content will return the body of the response as-is, in bytes. For binary content this is exactly what you want. For text content that contains non-ASCII characters this means the content must have been encoded by the server into a bytestring using a particular encoding that is indicated by either a HTTP header or a <meta charset="..." /> tag. In order to make sense of those bytes they therefore need to be decoded after receiving using that charset.
Response.text now is a convenience method that does exactly this for you. It assumes the response body is text, and looks at the response headers to find the encoding, and decodes it for you, returning unicode.
But if your response doesn't contain text, this is the wrong method to use. Binary content doesn't contain characters, because it's not text, so the whole concept of character encoding does not make any sense for binary content - it's only applicable to text composed of characters. (That's also why you're seeing response.encoding == None - it's just bytes, there is no character encoding involved).
See Response Content and Binary Response Content in the requests documentation for more details.

Zend Framework Mail: Base64 header encoding on incoming mails

I try to download emails from my POP3/IMAP accounts using Zend Framework 1.12 and it's working fine. QP header fields will be decoded automatically. However, when a header field (from name or subject) is base64 encoded like this:
=?UTF-8?B?c3DEvsWIYcWl?=
it will not automatically base64 decode it. Don't know why. While it would be easy to fix this "my way", I would like to do it right.
Can anybody recommend a good approach how to deal with base64 headers?
Thanks a lot.
You can use use iconv_mime_decode_headers() PHP function.
$decoded = iconv_mime_decode_headers('Subject: '.$subject, 0, "UTF-8");
var_dump(decoded['Subject']);
Note, that you can pass multiple header parameters to one function, by separating them with newline or "\n". e.g.
$headers = "Subject: {$subject}\nFrom: {$from}";
$decoded = iconv_mime_decode_headers($headers, 0, "UTF-8");
In this case you will get array with keys "Subject" and "From" with decoded data.
Its the responsibility of mail mime parsers to decode the mail headers. There are open source base64 decoders available on net which can be used to decode these strings.

Extracting email attachment filename : Content-Disposition vs Content-type

I am working on a script that will handle email attachments. I see that, most of the time, both content-type and content-disposition headers have the filename, but I have seen cases where only one had proper encoding or valid mime header.
Is there a preferred header to use to extract the file name? If so, which one?
Quoting wikipedia http://en.wikipedia.org/wiki/MIME:
"Many mail user agents also send messages with the file name in the name parameter of the content-type header instead of the filename parameter of the content-disposition header. This practice is discouraged."
So it seems content-disposition is preferred. However as I am using JavaMail, current JavaMail API seems to have only a String getDisposition() method: http://javamail.kenai.com/nonav/javadocs/javax/mail/Part.html#getDisposition(). So you might need to work with the header directly if you are using JavaMail.