Are email headers case sensitive? - email

Are email headers case sensitive?
For example, is Content-Type different from Content-type?
According to RFC 5322, I don't see anything about case sensitivity. However, I'm seeing a problem with creating MIME messages using the PEAR Mail_mime module, and everything is pointing to the fact that our SMTP server uses Content-type and MIME-version instead of Content-Type and MIME-Version. I tried using another SMTP server (like GMail), but unfortunately our web servers are firewalled pretty tightly.

RFC 5322 does actually specify this, but it is very indirect.
Section 1.2.2 says:
This specification uses the Augmented
Backus-Naur Form (ABNF) [RFC5234]
notation for the formal definitions of
the syntax of messages.
In turn, Section 2.3 of RFC 5234 says:
NOTE:
ABNF strings are case insensitive and the character set for
these strings is US-ASCII.
So when RFC 5322 specifies a production rule like this:
from = "From:" mailbox-list CRLF
It is implicit that the "From:" is not case-sensitive.
[update]
As for Content-Type and MIME-Version, they are specified by the MIME spec (RFC 2045). That in turn refers to the BNF described by the original RFC 822, which (luckily) also makes it clear that these literal strings are case-insensitive.
Bottom line: According to the spec, Email headers are not case-sensitive, so it sounds like your mail server is buggy.

Related

RFC 5322 email format validation

How can I check if emails that are generated by my code a valid according to
RFC 5322 ?
Here's a PCRE regular expression (taken from a PHP library) that will validate according to RFC 5322:
'/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}#)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)#(?!(?1)[a-z\d-]{64,})(?1)(?>([a-z\d](?>[a-z\d-]*[a-z\d])?)(?>(?1)\.(?!(?1)[a-z\d-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f\d]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f\d][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f\d]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?9)){3}))\])(?1)$/isD'
Unlike Peter's answer it does allow for single-label domain names (which are syntactically valid) and IPv6 address literals.
However, I'd strongly suggest to instead validate according to RFC 5321 which doesn't allow for comments or folding white space (which are semantically invisible and so not actually a part of the email address) or for obsolete local parts (which can just be re-written as non-obsolete quoted strings):
'/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?#)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")#(?!.*[^.]{64,})(?>([a-z\d](?>[a-z\d-]*[a-z\d])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f\d]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f\d][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f\d]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?6)){3}))\])$/iD'
Using this regex its like 98% valid. It doesn't validate the following:
postbox#com
admin#mailserver1
user#[IPv6:2001:db8:1ff::a0b:dbd0]
But it covers everything else
^(([^<>()[\\]\\.,;:\\s#\"]+(\\.[^<>()[\\]\\.,;:\\s#\"]+)*)|(\".+\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(([a-zA-Z\\-0-9]+\\.)+[a-zA-Z]{2,}))$
Note: This is transported directly from some production Golang code so slashes are added.
Email Regex as per RFC 5322 Policy
After so much struggle I made the regex validating all the cases as per 5322 except one:
(1)admin#mailserver1 (local domain name with no TLD, although ICANN highly discourages dot less email addresses)
^(?=.{1,64}#)((?:[A-Za-z0-9!#$%&'*+-/=?^\{\|\}~]+|"(?:\\"|\\\\|[A-Za-z0-9\.!#\$%&'\*\+\-/=\?\^_{|}~ (),:;<>#[].])+")(?:.(?:[A-Za-z0-9!#$%&'*+-/=?^\{\|\}~]+|"(?:\\"|\\\\|[A-Za-z0-9\.!#\$%&'\*\+\-/=\?\^_{|}~ (),:;<>#[].])+")))#(?=.{1,255}.)((?:[A-Za-z0-9]+(?:(?:[A-Za-z0-9-][A-Za-z0-9])?).)+[A-Za-z]{2,})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|)])$
Please click here to get a clear idea about this regex
https://regex101.com/r/7u0dze/1

Is it appropriate or necessary to use percent-encoding with HTTP Headers?

When I'm building RESTful client and servers, is it appropriate or necessary to use percent-encoding with HTTP Headers (request or response), or does this type of encoding just apply to URIs?
Basically No, but see below.
RFC2616 describes percent-encoding only for URIs (search for % or HEX HEX or percent) and it defines the field-value without mentioning percent-encoding.
However, RFC2616 allows arbitraty octets (except CTLs) in the header field value, and has a half-baked statement mentioning MIME encoding (RFC2047) for characters not in ISO-8859-1 (see definition of TEXT in its Section 2.2). I called that statement "half-baked" because it does not exlictly state that ISO-8859-1 is the mandatory character set to be used for interpreting the octets, but despite of that, it normatively requires the use of MIME encoding for characters outside of that character set. It seems that both the use of ISO-8859-1 and the MIME encoding of header field values are not widely supported.
HTTPbis seems to have given up on this, and goes back to US-ASCII for header field values. See this answer for details.
My reading of this is:
For standard header fields (those defined in RFC2616), percent-encoding is not permitted.
For extension header fields, percent-encoding is not described in RFC2616, but there is room for applying all kinds of encodings, including percent-encoding, as long as the resulting characters are US-ASCII (if you want to be future-proof). Just don't think you have to use percent-encoding.
Some more sources I found:
https://www.quora.com/Do-HTTP-headers-need-to-be-encoded confirms my understanding, although it is not specific about standard headers vs extension headers and does not quote a source.
https://support.ca.com/us/knowledge-base-articles.TEC1904612.html argues that the percent-encoding of extension headers in their product is a measure of protection against CSS attacks.
TL;DR: Octet percent-encoding and base64 encoding are fine.
Indicating Character Encoding and Language for HTTP Header Field Parameters
https://www.rfc-editor.org/rfc/rfc8187
This document specifies an encoding suitable for use in HTTP header
fields...
Read the "3.2.3. Examples"
base64 encoding is fine too, as read the HTTP Basic Authorziation spec: https://www.rfc-editor.org/rfc/rfc7617

What is the RFC 822 format for the email addresses?

I have to make a regular expression for the email addresses (RFC 822) and I want to know which characters are allowed in the local part and in the domain.
I found this https://www.rfc-editor.org/rfc/rfc822#section-6.1 but I don't see that it says which are the valid characters.
According to RFC 822, the local part may contain any ASCII character, since local-part is defined using word, which is defined as atom / quoted-string; atom covers most ASCII characters, and the rest can be written in a quoted-string. There are syntactic restrictions, but obeying them, any ASCII character can be used.
On similar grounds, RFC 822 allows any ASCII character in the domain part.
On the other hand, RFC 822 was obsoleted in 2001 by RFC 2822, which in turn was obsoleted in 2008 by RFC 5322. The status of RFCs can be checked from the RFC Editor’s RFC database.

Are international characters (e.g. umlaut characters) valid in the local part of email addresses?

Are german umlauts (ä, ö, ü) and the sz-character (ß) valid in the local part of an email-address?
For example take this email-address: björn.nußbaum#trouble.org
RFC 5322 quite clearly says, that umlauts (and other international characters) aren't allowed. If I take a look at chapter 3.4.1, there's the following regarding the local part:
local-part = dot-atom / quoted-string / obs-local-part
So what means dot-atom? It's described in chapter 3.2.3: Well, long story short: Printable US-ASCII characters not including specials
So in the whole RFC 5322 I can't see anything regarding international characters.
Or is RFC 5322 already obsolete? (RFC 822 -> RFC 2822 -> RFC 5322)
Update:
The important point for me is: What's the current standard? International characters allowed or not?
RFC 5322 is marked as DRAFT STANDARD. So I think that's the most recent source to rely on, isn't it?
Efran mentioned, that RFC 5336 allows international characters. But RFC 5336 is marked as EXPERIMENTAL, so that's not interesting for me.
Yes, they are valid characters as long as the mail exchanger responsible for the email address supports the UTF8SMTP extension, discussed in RFC 5336. Beware that just a small portion of the mail exchangers out there supports internationalized email addresses.
Both our email validation component for Microsoft .NET and our REST email validation service, for example, allow UTF8 characters in the local part of an email address but will mark it as invalid if its related mail exchanger does not support the aforementioned extension.
https://www.rfc-editor.org/rfc/rfc5322#section-3.4.1 is your latest standards track reference. Generally it is not advisable to use characters which require quoting due to the outrageously high amount of standards unconformant MTAs out there. Such email are bound to get lost in the long run.
As a friendly advice this table is pretty useful too (from Jochen Topf, titled "Characters in the local part of an email address"): https://www.jochentopf.com/email/chars.html
It looks like rfc6531 replaces 5336 and it is "PROPOSED STANDARD"
https://www.rfc-editor.org/rfc/rfc6531

Parsing of HTTP Headers Values: Quoting, RFC 5987, MIME, etc

What confuses me is decoding of HTTP header values.
Example Header:
Some-Header: "quoted string?"; *utf-8'en'Weirdness
Can header value's be quoted? What about the encoding of a " itself? is ' a valid quote character? What's the significance of a semi-colon (;)? Could the value parser for a HTTP header be considered a MIME parser?
I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields. That's why I need so much detail on the format.
Can header values be quoted?
If you mean does the RFC 5987 parameter production apply to the main part of the header value, then no.
Some-Header: "foo"; bar*=utf-8'en'bof
Here the main part of the header value would probably be "foo" including the quotes, but...
What's the significance of a semi-colon (;)?
The specific handling is defined for each named header separately. So semicolon is significant for, say, Content-Disposition, but not for Content-Length.
Obviously this is not a very satisfactory solution but that's what we're stuck with.
I am making a transparent proxy that needs to transparently handle and modify many in-the-wild header fields.
You can't handle these in a generic way, you have to know the form of each possible header. For anything you don't recognise, don't attempt to decompose the header value; and really, so little out there supports RFC 5987 at the moment, it's unlikely you'll be able to do much useful handling of it.
Status quo today is that non-ASCII characters in header values doesn't work well enough cross-browser to be used at all, either encoded or raw.
Luckily they are rarely needed. The only really common use case is non-ASCII filenames for Content-Disposition but that's easier to work around by putting the filename in a trailing URL path part instead.
Could the value parser for a HTTP header be considered a MIME parser?
No. HTTP borrows heavily from MIME and the RFC 822 family of standards in general, but it isn't part of the 822 family. It has its own low-level grammar for headers which looks like 822, but isn't quite compatible. Arbitrary MIME features can't be used in HTTP, there has to be a standardisation mechanism to drag them into HTTP explicitly—which is what RFC 5987 is, for (parts of) RFC 2231.
(See section 19.4 of RFC 2616 for discussion of some other differences.)
In theory, a multipart form submission is part of the 822 family and you should be able to use RFC 2231 encoding there. But the reality is browsers don't support that either.