How to use 8bit encoding in multipart MIME messages in SMTP? - email

I have a question about how to use 8bit encoding in mulitpart MIME in SMTP.
Per MIME Wiki, we can specify value of "Content-Transfer-Encoding:" as 8bit.
For such condition, do we need to use 8bit MIME extension for SMTP?

Yes. You can only use Content-Transfer-Encoding: 8bit with an SMTP server that supports the 8BITMIME extension.
In RFC 2045 section 6.2, it says "Mail transport for unencoded 8bit data is defined in RFC 1652".
RFC 1652 (obsoleted by RFC 6152) describes and defines the 8BITMIME extension. While I don't see a specific sentence to quote to you, the entire document assumes that if you're sending 8bit data, that you're using the 8BITMIME extension.
I haven't seen anything to the contrary.

Related

How to encode the filename parameter value of the Content-Disposition header in MIME message?

By checking the source of some emails, I found that many emails use 'Encoded Words' (RFC 2047) format to encode the filename parameter values. However, according to RFC 2047, this encoding method should not be used to header parameter values. Instead, the parameter value, such as the filename parameter in Content-Disposition header, should use the encoding method suggested by RFC 2231.
Thus, my question is why so many emails don't comply with the RFC standards. Is it a right way to encode the header parameter value with RFC 2047 format? Can all the email agents parse these emails properly?
The sad truth is that many popular email clients are in violation of pertinent RFCs.
Indeed, as you surmise, filenames in MIME body parts should use RFC2231, but many implementations out in the wild use RFC2047 or a number of other informal, ad-hoc, or at worst indeterminable filename encodings.
As for the "why", I don't really think this is answerable. Fundamentally I think we can't do better than guess it's a mistake at some level.
Common and easily identified incorrect encodings seem to work fairly transparently between popular clients; but by definition, failure to adhere to the specification removes any guarantee that the recipient can correctly guess what was intended.
For reference, here is a model message which should hopefully pass validation (-:
From: me <tripleee#example.org>
To: =?utf-8?B?G=C3=B6del?= <goedel#example.net>
Subject: File name and recipient are identical,
but encoded differently
Mime-Version: 1.0
Content-type: application/octet-stream;
name*=UTF-8''G%C3%B6del
Content-disposition: attachment;
filename*=UTF-8''G%C3%B6del
Content-transfer-encoding: base64
R8O2ZGVsCg==
For the record, the Content-Type: header's name parameter is superseded by the filename parameter of the Content-Disposition: header, but many implenentations still conservatively specify both, in case some client somewhere still doesn't grok Content-Disposition:

How to manage Encoding Problems with äöü in email send through smtp?

I have send an email äöü in E-Mail Notifications (Video Title) Example: Vielseitigkeit-Europameisterschaft 2011: Freya Fllgraebe 1 ist nun in der Kategorie.thats word äöü is changes into Fllgraebe.How to remain same äöü when we send mail through smtp in php?
The MIME specifications define various encoding algorithms for this depending on whether the text appears in the message body, one of the message headers, or in parameters to the Content-Type/Content-Disposition headers (such as filename).
All of the MIME specifications are hosted on www.ietf.org. For a set of quick links, check out https://github.com/jstedfast/MimeKit/blob/master/RFCs.md
Add some headers to you e-mail declaring the encoding you are using. Assuming you are using UTF-8 use the following headers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Can an email header have different character encoding than the body of the email?

Is an email with different character encoding for it's header and body valid?
The Use Case: While processing an email, should I check for the character encoding of it's header separately, or will checking that of it's body be sufficient?
Can someone guide me as to how to figure this out?
Thanks in advance!
Email headers should use the ASCII charset, if you want the header fields to have a different encoding you need to use the encoded word syntax: http://en.wikipedia.org/wiki/MIME#Encoded-Word
The email body can be directly encoded in different encoding only if mail servers that transfer it have 8bit mime enabled (nowadays every mail server should have it enabled, but it's not guaranteed), otherwise you need to encode the body in transfer encoding (quoted-printable or base64)
The charset can be different in each case, that is you can have every encoded word in different charset and every mail part encoded in different charset or even different transfer encoding as well.
For example you can have:
Subject: =?UTF-8?Q?Zg=C5=82oszenie?= //header value in UTF-8 encoded with quoted printable
and the body encoded:
Content-Type: text/plain; charset="iso-8859-2"
Content-Transfer-Encoding: base64
WmG/87PmIEfqtmyxIEphvPE=
different charsets, different transfer encodings in the same email, no problem.
From experience I can tell you that such mails are very common. Even worse, you can get an email that states one charset in Content-Type header and another charset in html body meta tag:
Content-Type: text/html; charset="iso-8859-2"
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charser=utf-8">
It's up to you to guess the actual charset used. Probably it's the one in meta tag.
Assume nothing. Expect everything. Take no prisoners. This is Sparta.

Is Content-Transfer-Encoding needed for multipart/alternative Content-Type?

I have an app that sends e-mails and for many months, it was working fine. I recently had problems with utf-8 encoded emails sent to iPhone Exchange account (i.e. not IMAP).
All the receiver had to see was a big bunch of meaningless characters like LS0tLS0tPV9QYXJ0XzE0N18[....].
Comparing my email headers with Gmail, I noticed that I had an extra Content-Transfer-Encoding associated with the Content-Type: multipart/alternative;.
My email would look like
Delivered-To: ...
Received: ...
...
MIME-Version: ...
Content-Type: multipart/alternative;
boundary="----=_boundary"
Content-Transfer-Encoding: Base64 # <= the extra setting
----=_boundary
Content-type: text/plain; charset=utf-8
Content-Transfer-Encoding: Base64
YmVu0Cg==
----=_boundary
Content-type: text/html; charset=utf-8
Content-Transfer-Encoding: Base64
PGh0bWwgeG1sbnM6bz0iIj48aGVhZD48dGl0bGU+PC90aXRsZT48L2hlYWQ+PGJvZHk+YmVub2l0
PC9ib2R5PjwvaHRtbD4NCjx9IjAiIC8+Cg==
----=_boundary
If I remove the extra setting, my email is received and display properly.
My questions:
Is the Encoding setting basically needed with Content-Type: multipart/alternative;, even for specific cases ?
Is it safe to remove it and just keep using my app as I used to ?
Edit
I found on IETF:
Encoding considerations: Multipart content-types cannot have encodings.
But I also found on Wikipedia:
The content-transfer-encoding of a multipart type must always be
"7bit", "8bit" or "binary" to avoid the complications that would be
posed by multiple levels of decoding.
Isn't it contradictory ?
The statements from IETF and wikipedia aren't really contradictory. 7bit, 8bit, or binary aren't really content encodings in that they don't specify any transformation of the content. They simply state that the content hasn't been encoded. In the case of 7bit it also specifies that the content doesn't need to be encoded even if the message needs to be sent over a transport that isn't 8-bit clean.
Only the bottom-most layers of messages should have an actual Content-Transfer-Encoding such as base64 or quoted-printable. In the message that you quote from the outer portion certainly isn't base64 encoded, so stating that it is was not only violating the standard but also incorrect. That would certainly be expected to cause problems with display of that message.
In practice, each part of a multipart has its own encoding, and it doesn't make sense to declare one for the multipart container. I cannot make sense of the Wikipedia quote anyway; in any event, it is hardly authoritative.

Convert 7bit text to plain text perl

I parsing email message and i found part with encoding: 7bit
how a can convert text of this part to plain text?
i use perl
Content-Transfer-Encoding: 7bit
means that the text is already plain old ASCII text. No conversion is necessary. (Well, unless the Content-Type header indicates a non-ASCII-based charset, but those are pretty rare, especially with 7bit text.)
It sounds like you have UU-encoded data (older method) or MIME-encoded. To deal with that, you can use Convert::UU and MIME::Base64 CPAN modules respectively.
To use MIME::Base64 (or its pure-Perl implementation, MIME::Base64::Perl):
use MIME::Base64::Perl;
my $decoded = decode_base64($encoded);
How do you know the difference?
The modern MIME encoded text looks like this (Especially pay attention to MIME-Version: header which tells you it's MIME-encoded as well as Content-Transfer-Encoding header which tells you the encoding base - if it's not base64, you need a different CPAN module:
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="frontier"
This is a message with multiple parts in MIME format.
--frontier
Content-Type: text/plain
This is the body of the message.
--frontier
Content-Type: application/octet-stream
Content-Transfer-Encoding: base64
UU-encoded text would look something like:
begin 644 cat.txt
#0V%T
`
end
If the encoded data looks differently than either of the above samples, please post the exact format so we can determine what it is.