Maximum length of reply-to header - email

Is there a maximum length for the reply-to header field? The maximum length for a e-mail address is defined but in reply-to you can add several addresses.

Regarding latest RFC 5322 documentation
2.2.3. Long Header Fields
Each header field is logically a single line of characters
comprising the field name, the colon, and the field body. For
convenience however, and to deal with the 998/78 character
limitations per line, the field body portion of a header field can
be split into a multiple-line representation; this is called
"folding". The general rule is that wherever this specification
allows for folding white space (not simply WSP characters), a CRLF
may be inserted before any WSP.

Related

Are newlines in MIME headers using encoded-words legal?

RFC 2047 defines the encoded-words mechanism for encoding non-ASCII character in MIME documents. It specifies that whitespace characters (space and tabs) are not allowed inside the encoded-word.
However, RFC 5322 for parsing email MIME documents specifies that long header lines should be "folded". Should this folding take place before or after encoded-words decoding?
I recently received an email where encoded-text part of the header had a newline in it, like this:
Header: =?UTF-8?Q?=C3=A5
=C3=A4?=
Would this be valid?
Of course emails can be invalid in lots of exciting ways and the parser needs to handle that, but it's interesting to know the "correct" way. :)
I misread the question and answered as if it was a different sort of whitespace. In this case the white space appears inside the MIME word, not multiple ones separated by white space.
This sort of thing is explicitly disallowed. From the introduction to the format in RFC2047:
2. Syntax of encoded-words
An 'encoded-word' is defined by the following ABNF grammar. The
notation of RFC 822 is used, with the exception that white space
characters MUST NOT appear between components of an 'encoded-word'.
And then later on in the same section:
IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
by an RFC 822 parser. As a consequence, unencoded white space
characters (such as SPACE and HTAB) are FORBIDDEN within an
'encoded-word'. For example, the character sequence
=?iso-8859-1?q?this is some text?=
would be parsed as four 'atom's, rather than as a single 'atom' (by
an RFC 822 parser) or 'encoded-word' (by a parser which understands
'encoded-words'). The correct way to encode the string "this is some
text" is to encode the SPACE characters as well, e.g.
=?iso-8859-1?q?this=20is=20some=20text?=
The characters which may appear in 'encoded-text' are further
restricted by the rules in section 5.
Earlier answer
This sort of thing is explicitly allowed. Headers with MIME words should be 76 characters or less and folded if needed. RFC822 folded headers are indented second and any additional lines. RFC2047 headers are supposed to only indent one space. The whitespace between ?= on the first line and =? should be suppressed from output.
See the example on the bottom of page 12 of the RFC:
encoded form displayed as
---------------------------------------------------------------------
(=?ISO-8859-1?Q?a?= (ab)
=?ISO-8859-1?Q?b?=)
Any amount of linear-space-white between 'encoded-word's,
even if it includes a CRLF followed by one or more SPACEs,
is ignored for the purposes of display.

How to format an email 'From' header that contains a comma

The standard way to format the 'From' email header is
From: John Doe <john.doe#example.com>
But what to do if there's a comma in the name?
From: John Doe, chief bottle washer <john.doe#example.com>
If I do that, my MTA automatically converts this into:
From: John#this.server.com, Doe#this.server.com, chief bottle washer <john.doe#example.com>
My first guess is to use double-quotes around the full name, but I can't find any official documentation confirming this and I'd like my emails to be readable by all email clients.
To elaborate on the answer by #Fls'Zen, yes the proper method is to enclose the name in double-quotes.
From a practical point of view there's no harm in wrapping all names in double-quotes, just be sure to escape a double-quote if it appears in the display name \" (or just replace with a single-quote). But if you want to be completely by the spec, you shouldn't use the double quotes if you don't have to.
For all the dense details, E-mail header fields are defined by RFC 5322. The relevant section for multiple originators in the From header is 3.6.2, and the relevant sections for quoting delimiters is 3.2.1 and 3.2.4.
When the following regular expression matches, then an email display address must be quoted.
[^-A-Za-z0-9!#$%&'*+/=?^_`{|}~\s]
For ASCII characters, this can be done by escaping any double quote characters with a backslash, and enclosing the string in double quotes. For non-ASCII characters, the more complex MIME escaping is required.
E-mail header fields are defined by RFC 5322. The relevant section for multiple originators in the From header is 3.6.2. The relevant sections for quoting delimiters is 3.2.1 and 3.2.4.

Should I look for e-mail header fields names in case-sensitive or case-insensitive manner?

Section 2.2 of RFC 2822 defined e-mail message header fields. However it doesn't say explicitly if the header name should be interpreted in case-sensitive or case-insensitive manner.
For example, if I want to find the "Carbon Copy" section should I look for "Cc:" in case sensitive manner? Or if a message already has "Cc:" field can it also have "CC:" field? Does the requirement to interpret fields name in case-sensitive or case-insensitive manner apply to all or only select fields?
If the RFC doesn't define it, it is left as an implementation detail.
To be safe, I would go with case-insensitive to allow for different implementations to work without failing.
By the way, RFC 2822 has been obsoleted by RFC 5322 (which also has no such discussion).
See section 1.2.2. "Syntactic Notation” in RFC 5322. "Characters will be specified either by a decimal value (e.g., the value %d65 for uppercase A and %d97 for lowercase A) or by a case-insensitive literal value enclosed in quotation marks (e.g., "A" for either uppercase or lowercase A).” Later on the header field names are specified in quotes, meaning they are case insensitive.
From my experience you should use case insensitive checks as different clients/servers do different things with the headers.

What do the numbers in a multi-part email mean?

I'm looking at the source of a multi-part message from Thunderbird (in hopes of writing my own multi-part message from C++/Javascript)
I was wondering what the follow means (the part between the text-only part and the html part of the email) and how I might calculate it for my own program to generate a multi-part email:
This is a multi-part message in MIME format.
------=_NextPart_32252.1057009685.31.001
Content-Type: multipart/alternative;
boundary="----=_NextPart_32252.1057009685.31.002"
Content-Description: Message in alternative text and HTML forms
------=_NextPart_32252.1057009685.31.002
(as seen here)
The rest of the message code makes sense to me for the post part.
The numbers you are seeing within the boundary delimiters don't necessarily mean anything (although the RFC doesn't preclude an implementor from trying to include some meaning).
They must be unique and not contained within the part that they encapsulate.
From RFC 2046:
5.1. Multipart Media Type
In the case of multipart entities,
in which one or more different sets
of data are combined in a single body,
a "multipart" media type field must
appear in the entity's header. The
body must then contain one or more
body parts, each preceded by a
boundary delimiter line...
As stated previously, each body part is preceded by a boundary
delimiter line that contains the boundary delimiter. The boundary
delimiter MUST NOT appear inside any of the encapsulated parts, on a
line by itself or as the prefix of any line...
...
5.1.1. Common Syntax
The Content-Type field for
multipart entities requires one
parameter, "boundary". The boundary
delimiter line is then defined as a
line consisting entirely of two
hyphen characters ("-", decimal value
45) followed by the boundary
parameter value from the Content-Type
header field, optional linear
whitespace, and a terminating CRLF.
...
NOTE: Because boundary delimiters must not appear in the body parts
being encapsulated, a user agent must exercise care to choose a
unique boundary parameter value. The boundary parameter value
[could be] the result of an algorithm designed to
produce boundary delimiters with a very low probability of already
existing in the data to be encapsulated without having to prescan the
data. ... The
simplest boundary delimiter line possible is something like "---",
with a closing boundary delimiter line of "-----".
They don't mean anything. They are just a random string that does not occur within the body of the email. They are just used to mark where the embedded message starts and stops.

What is the email subject length limit?

How many characters are allowed to be in the subject line of Internet email?
I had a scan of The RFC for email but could not see specifically how long it was allowed to be.
I have a colleague that wants to programmatically validate for it.
If there is no formal limit, what is a good length in practice to suggest?
See RFC 2822, section 2.1.1 to start.
There are two limits that this
standard places on the number of
characters in a line. Each line of
characters MUST be no more than 998
characters, and SHOULD be no more than
78 characters, excluding the CRLF.
As the RFC states later, you can work around this limit (not that you should) by folding the subject over multiple lines.
Each header field is logically a
single line of characters comprising
the field name, the colon, and the
field body. For convenience however,
and to deal with the 998/78 character
limitations per line, the field body
portion of a header field can be split
into a multiple line representation;
this is called "folding". The general
rule is that wherever this standard
allows for folding white space (not
simply WSP characters), a CRLF may be
inserted before any WSP. For
example, the header field:
Subject: This is a test
can be represented as:
Subject: This
is a test
The recommendation for no more than 78 characters in the subject header sounds reasonable. No one wants to scroll to see the entire subject line, and something important might get cut off on the right.
RFC2322 states that the subject header "has no length restriction"
but to produce long headers but you need to split it across multiple lines, a process called "folding".
subject is defined as "unstructured" in RFC 5322
here's some quotes ([...] indicate stuff i omitted)
3.6.5. Informational Fields
The informational fields are all optional. The "Subject:" and
"Comments:" fields are unstructured fields as defined in section
2.2.1, [...]
2.2.1. Unstructured Header Field Bodies
Some field bodies in this specification are defined simply as
"unstructured" (which is specified in section 3.2.5 as any printable
US-ASCII characters plus white space characters) with no further
restrictions. These are referred to as unstructured field bodies.
Semantically, unstructured field bodies are simply to be treated as a
single line of characters with no further processing (except for
"folding" and "unfolding" as described in section 2.2.3).
2.2.3 [...] An unfolded header field has no length restriction and
therefore may be indeterminately long.
after some test: If you send an email to an outlook client, and the subject is >77 chars, and it needs to use "=?ISO" inside the subject (in my case because of accents) then OutLook will "cut" the subject in the middle of it and mesh it all that comes after, including body text, attaches, etc... all a mesh!
I have several examples like this one:
Subject: =?ISO-8859-1?Q?Actas de la obra N=BA.20100154 (Expediente N=BA.20100182) "NUEVA RED FERROVIARIA.=
TRAMO=20BEASAIN=20OESTE(Pedido=20PC10/00123-125),=20BEASAIN".?=
To:
As you see, in the subject line it cutted on char 78 with a "=" followed by 2 or 3 line feeds, then continued with the rest of the subject baddly.
It was reported to me from several customers who all where using OutLook, other email clients deal with those subjects ok.
If you have no ISO on it, it doesn't hurt, but if you add it to your subject to be nice to RFC, then you get this surprise from OutLook. Bit if you don't add the ISOs, then iPhone email will not understand it(and attach files with names using such characters will not work on iPhones).
Limits in the context of Unicode multi-byte character capabilities
While RFC5322 defines a limit of 1000 (998 + CRLF) characters, it does so in the context of headers limited to ASCII characters only.
RFC 6532 explains how to handle multi-byte Unicode characters.
Section 3.4 ( Effects on Line Length Limits ) states:
Section 2.1.1 of [RFC5322] limits lines to 998 characters and
recommends that the lines be restricted to only 78 characters. This
specification changes the former limit to 998 octets. (Note that, in
ASCII, octets and characters are effectively the same, but this is
not true in UTF-8.) The 78-character limit remains defined in terms
of characters, not octets, since it is intended to address display
width issues, not line-length issues.
So for example, because you are limited to 998 octets, you can't have 998 smiley faces in your subject line as each emoji of this type is 4 octets.
Using PHP to demonstrate:
Run php -a for an interactive terminal.
// Multi-byte string length:
var_export(mb_strlen("\u{0001F602}",'UTF-8'));
// 1
// ASCII string length:
var_export(strlen("\u{0001F602}"));
// 4
// ASCII substring of four octet character:
var_export(substr("\u{0001F602}",0,4));
// '😂'
// ASCI substring of four octet character truncated to 3 octets, mutating character:
var_export(substr("\u{0001F602}",0,3));
// '▒'
I don't believe that there is a formal limit here, and I'm pretty sure there isn't any hard limit specified in the RFC either, as you found.
I think that some pretty common limitations for subject lines in general (not just e-mail) are:
80 Characters
128 Characters
256 Characters
Obviously, you want to come up with something that is reasonable. If you're writing an e-mail client, you may want to go with something like 256 characters, and obviously test thoroughly against big commercial servers out there to make sure they serve your mail correctly.
Hope this helps!
What's important is which mechanism you are using the send the email. Most modern libraries (i.e. System.Net.Mail) will hide the folding from you. You just put a very long email subject line in without (CR,LF,HTAB). If you start trying to do your own folding all bets are off. It will start reporting errors. So if you are having this issue just filter out the CR,LF,HTAB and let the library do the work for you. You can usually also set the encoding text type as a separate field. No need for iso encoding in the subject line.