How to escape a full email address for SMTP in the headers when the email address contains non-ascii chars - email

It's about sending emails with non ASCII chars in the email address.
When I use send the TO /RCPT stuff to the SMTP server I know that I need to use punycode here.
But what about the To: and From: Header. Again I know that if the User friendly part contains a non ascii char I con use the standard header encoding that I also use for the subject. But this encoding is only used for the user friendly part.
But what if the email address contains a non ascii char? How must the To header be formatted.
So how to encode "Tüst" ?
This is the encoding as far as I know.
"=?iso-8859-1?Q?T=FCst?="<tüst#domain.de>
But what with the email address.
In fact: I don't understand the RFC's. I tried hard but failed.

The answer is: UTF-8 is the correct way to encode the header.
After some more research I found the answer hidden inside this article:
https://en.wikipedia.org/wiki/International_email
Although the traditional format for email header section allows
non-ASCII characters to be included in the value portion of some of
the header fields using MIME-encoded words (e.g. in display names or
in a Subject header field), MIME-encoding must not be used to encode
other information in a header, such as an email address, or header
fields like Message-ID or Received. Moreover, the MIME-encoding
requires extra processing of the header to convert the data to and
from its MIME-encoded word representation, and harms readability of a
header section.
The 2012 standards RFC 6532 and RFC 6531 allow the inclusion of
Unicode characters in a header content using UTF-8 encoding, and their
transmission via SMTP - but in practice support is only slowly rolling
out.[5]

Related

How to send a csv attachment with lines longer than 990 characters?

Alright. I thought this problem had something to do with my rails app, but it seems to have to do with the deeper workings of email attachments.
I have to send out a csv file from my rails app to a warehouse that fulfills orders places in my store. The warehouse has a format for the CSV, and ironically the header line of the CSV file is super long (1000+ characters).
I was getting a line break in the header line of the csv file when I received the test emails and couldn't figure out what put it there. However, some googling has finally showed the reason: attached files have a line character limit of 1000. Why? I don't know. It seems ridiculous, but I still have to send this csv file somehow.
I tried manually setting the MIME type of the attachment to text/csv, but that was no help. Does anybody know how to solve this problem?
Some relevant google results : http://www.google.com/search?client=safari&rls=en&q=csv+wrapped+990&ie=UTF-8&oe=UTF-8
update
I've tried encoding the attachment in base64 like so:
attachments['205.csv'] = {:data=> ActiveSupport::Base64.encode64(#string), :encoding => 'base64', :mime_type => 'text/csv'}
That doesn't seem to have made a difference. I'm receiving the email with a me.com account via Sparrow for Mac. I'll try using gmail's web interface.
This seems to be because the SendGrid mail server is modifying the attachment content. If you send an attachment with a plain text storage mime type (e.g text/csv) it will wrap the content every 990 characters, as you observed. I think this is related to RFC 2045/821:
Content-Transfer-Encoding Header Field
Many media types which could be usefully transported via email are
represented, in their "natural" format, as 8bit character or binary
data. Such data cannot be transmitted over some transfer protocols.
For example, RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII
data with lines no longer than 1000 characters including any trailing
CRLF line separator.
It is necessary, therefore, to define a standard mechanism for
encoding such data into a 7bit short line format. Proper labelling
of unencoded material in less restrictive formats for direct use over
less restrictive transports is also desireable. This document
specifies that such encodings will be indicated by a new "Content-
Transfer-Encoding" header field. This field has not been defined by
any previous standard.
If you send the attachment using base64 encoding instead of the default 7-bit the attachment remains unchanged (no added line breaks):
attachments['file.csv']= { :data=> ActiveSupport::Base64.encode64(#string), :encoding => 'base64' }
Could you have newlines in your data that would cause this? Check and see if
csv_for_orders(orders).lines.count == orders.count
If so, a quick/hackish fix might be changing where you call values_for_line_item(item) to values_for_line_item(item).map{|c| c.gsub(/(\r|\n)/, '')} (same for the other line_item calls).

Is it possible to send email to an address that contains latin unicode characters with cfmail?

We need to be able to send an email with cfmail to an email address that contains a latin a with acute. I assume we'll eventually have to allow other Unicode characters too - a sample email address is foobár#example.com. ColdFusion throws an error on this email address, which is technically valid. Since the acute a is a UTF-8 character, and the default encoding for cfmail is UTF-8, I'm not sure what other settings I would need to enable to make this work. Is this possible?
The error I get is Attribute validation error for tag CFMAIL.
Detail: The value of the attribute to, which is currently foobár#example.com, is invalid.
I'm neither an I18N nor email expert but my understanding FWIW is that current systems don't generally support unicode in the local part of the email address, i.e. the mailbox name before the #. Local mail servers may support it and allow a name such as foobár internally, but if that person wants to receive mail from the outside world they will also need an ASCII alias such as foobar.
There is however a mechanism for supporting unicode in the domain portion of the address, which involves conversion to an ASCII representation called punycode. This means an address such as foo#foobár.com will be converted to foo#xn--foobr-0qa.com which current DNS and mail systems will accept.
It's possible to do this conversion in ColdFusion by using existing Java libraries. For more detail see this question.

What means Zend_Mime::ENCODING_8BIT when sending mails with Zend_Mail?

In the example for Zend_Mail on http://framework.zend.com/manual/en/zend.mail.attachments.html they use ENCODING_8BIT but searching for what that might be sends me to http://msdn.microsoft.com/en-us/library/ms526992%28EXCHG.10%29.aspx were (and this sounds logical to me) it is explained that 8bit encoding does not make sense for emails.
Edit:
When I use this encoding for a mail with an attachment, I receive the mail with a corrupted attachment in my mail software (Thunderbird)
In which cases does it make sense to use ENCODING_8BIT?
As everybody said, ENCODING_8BIT represents the Content Transfer Encoding.
Basically, 8BITMIME is used for Internationalization. It's using a 8-bit character sets and therefore, allow you to send any character supported in the UTF8 charset.
In general, non-MIME mailers send 8-bit data but do not include any
MIME headers to mark the message as 8-bit data. MIME mailers should
cope with this without any problems. [source]
So basically there is not really a case where it makes sense to use ENCODING_8BIT over another encoding since emails in UTF8 are a standard today. Also, note that most of the MTAs (Message Transfer Agent, such as Postfix, etc.) automatically force the encoding to 8BITMIME (UTF-8).
Here is a good resource about the 8BITMIME encoding.
The 8BITMIME extension has two effects in practice:
The client will avoid Q-P conversion.
The client may add extra
information at the end of a MAIL request: a space followed by either
"BODY=7BIT" or "BODY=8BITMIME".
Zend_Mime::ENCODING_8BIT sets the Content-Transfer-Encoding.
The Content-Transfer-Encoding defines methods for representing binary data in ASCII text format.
The use of Zend_Mime::ENCODING_8BIT in the example is a Bug.
For sending Attachments you should always use Zend_Mime::ENCODING_BASE64
Not for email but for attachements. If you take a look on the RFC 2045 at page 7:
RFC2045
"Binary data" refers to data where any
sequence of octets whatsoever is
allowed.

Can an email address contain international (non-english) characters?

If it's possible, should I accept such emails from users and what problems to expect when I will be sending mails to such addresses?
Officially, per RFC 6532 - Yes.
For a quick explanation, check out wikipedia on the subject.
Update 2015: Use RFC 6532
The experimental 5335 has been Obsoleted by: 6532 and
this later has been set to "Category: Standards Track",
making it the standard.
The Section 3.2 (Syntax Extensions to RFC 5322) has updated most text fields to
include (proper) UTF-8.
The following rules extend the ABNF syntax defined in [RFC5322] and
[RFC5234] in order to allow UTF-8 content.
VCHAR =/ UTF8-non-ascii
ctext =/ UTF8-non-ascii
atext =/ UTF8-non-ascii
qtext =/ UTF8-non-ascii
text =/ UTF8-non-ascii
; note that this upgrades the body to UTF-8
dtext =/ UTF8-non-ascii
The preceding changes mean that the following constructs now
allow UTF-8:
1. Unstructured text, used in header fields like
"Subject:" or "Content-description:".
2. Any construct that uses atoms, including but not limited
to the local parts of addresses and Message-IDs. This
includes addresses in the "for" clauses of "Received:"
header fields.
3. Quoted strings.
4. Domains.
Note that header field names are not on this list; these are still
restricted to ASCII.
Please note the explicit inclusion of Domains.
And the explicit exclusion of header names.
Also Note about NFKC:
The UTF-8 NFKC normalization form SHOULD NOT be used because
it may lose information that is needed to correctly spell
some names in some unusual circumstances.
And Section 3 start:
Also note that messages in this format require the use of the
SMTPUTF8 extension [RFC6531] to be transferred via SMTP.
The problem is that some mail clients (server-tools and / or desktop tools) don't support it and throw an 'invalid email' exception when you try to send a mail to an address which contains umlauts for example.
If you want full support, you could do the trick with converting the email-address parts to "punycode". This allows users to type in their addresses the usual way but you save it the supported-level way.
Example: müller.com » xn--mller-kva.com
Both points to the same thing.
I would assume yes since a number of top level domains already allow non ascii
characters for domains and since the domain is part of an email address, it's
perfectly possible. An example for such a domain would be www.öko.de
short answer: yes
not only in the username but also in the domain name are allowed.
The answer is yes, but they need to be encoded specially.
Look at this. Read the part that refers to email-headers and RFC 2047.
Not yet. The IEEE plans to do this:
H-Online article: IEFT planning internationalised email addresses, here is the RfC: SMTP Extension for Internationalized Email Addresses
Quote from H-Online (as it went down):
The Internet Engineering Task Force (IETF) has published three crucial documents for the standardisation of email address headers
that include symbols outside the ASCII character set. This means that
soon you'll be able to use Chinese characters, French accents, and
German umlauts in email addresses as well as just in the body of the
message. So if your name is Zoë and you work for a company that makes
façades, you might be interested in a new email address. But
representatives of providers are already moaning. They say there would
need to be an "upgrade mania" if the Unicode standard UTF-8 is to
replace the American Standard Code for Information Interchange (ASCII)
currently used as the general email language.
RFC 5335 specifies the use of UTF-8 in practically all email headers.
Changes would have to be made to SMTP clients, SMTP servers, mail user
agents (MUAs), software for mailing lists, gateways to other media,
and everywhere else where email is processed or passed along. RFC 5336
expands the SMTP email transport protocol. At the level of the
protocol, the expansion is labelled UTF8SMTP.
A new header field will be added as a sort of "emergency parachute" to
ensure that UTF-8 emails have a soft landing if they are thrown out
before reaching the recipient by systems that have not been upgraded.
The "OldAddress" is a purely ASCII address. But OldAddress is not to
be used as a channel for a second transfer attempt, but rather to make
sure that feedback is sent home.
Finally, RFC5337 ensures that correct messages are sent pertaining to
the delivery status of non-ASCII emails. The correct address of an
unreachable addressee must be sent back, even if further transport has
been refused. The email Address Internationalization (EAI) working
group is also working on a number of "downgrade mechanisms" for
various header fields and the envelope. If possible, original header
information is to be "packaged" and preserved.
Germany's DeNIC, the registrar for the ".de" domain, is nonetheless
taking this in its stride. "There is really not much we can do",
explained DeNIC spokesperson Klaus Herzig. DeNIC is instead paying
more attention to the update that the IETF is working on for the
standard of international domains – RFC3490, or IDNA2003 as it's
sometimes known. "We are not that happy about it because there is no
backwards compatibility," Herzig explained. When the update comes,
DeNIC says it will be throwing its weight behind the symbol "ß" - also
known as estzett - which has been overlooked up to now. The German
registrar also says that it may wait a bit before switching in light
of the lack of backward compatibility. Once the new standard is
running stably and registrars and providers have adopted it, the ß
will be added.
In contrast, experts believe that Chinese registrars in China and
Taiwan will quickly implement the change for internationalised email.
Representatives of CNIC and TWNIC are authors of the standards.
Chinese users currently have to write emails in ASCII to the left of
the # and in Chinese characters to the right of it for Chinese
domains, which have already been internationalized.
(Monika Ermert)

What escaping or purification is needed for an email subject?

Sites like Facebook have the user's name in the subject line that sent you a message.
Because of this, what escaping would you do on user entered values in a message subject? Or would you just not allow anything other than a-z, 0-9, period, comma and single quotes?
You need to be careful with email headers, 8 bit chars are a bit of a no-no. (mail servers will reject them).
The proper way to do it is to MIME encode your subject lines and make sure the ASCII char \n is not in the subject line (technically multi-line subjects are possible, but I'd imagine plenty of mail clients would have problems)
See http://en.wikipedia.org/wiki/MIME#Encoded-Word for more info.
Escaping is needed if there are forbidden characters. The subject is terminated by a NL so this is the only (ASCII) character that shouldn't be put in the header.
See also rfc821
It’s the same problem with contact forms.
If you look at an email header you get e.g. this:
Subject: user123 has sent you an invite
From: "User123" <user123#example.org>
You have to make sure that user names do not resemble values of an email header. If it’s possible for a user to name himself “To: spamreceiver1#example.org, spamreceiver2#example.org, spamreceiver3#example.org, spamreceiver4#example.org” you have to clean the input.
A search for “contact form spam” should show you what to do. You should at least remove all occurrences of "To:", "Subject:", "From:" etc.