Can punycode-encoded email addresses clash with "real" addresses? - email

The problem is this: I'm using a third-party Email delivery service that doesn't accept mail addresses with non-ASCII characters in the name part, like müller#example.com .
Encoding such an address with Punycode:
http://en.wikipedia.org/wiki/Punycode
http://idnaconv.phlymail.de/index.php?decoded=m%C3%BCller%40example.com&idn_version=2008&encode=Encode+%3E%3E&lang=de
yields this address:
xn--mller-kva#example.com
And sending mail to it via the service seems to work.
However, I'm not sure if someone couldn't register "xn--mller-kva#example.com" directly, thus receiving Emails meant for "müller#example.com".
Is this clashing possible ? Are there other solutions for this problem ?
UPDATE
Thanks for the answers. Here's a summary of what we learned:
Punycoding the local part of the email address works, and you can send and receive from such an encoded address (of course)
However, there are no guarantees at all that providers or mail clients will understand the encoding, or do it automatically. Clashes are therefore possible, and the whole idea not a good one :)
One should simply do what everyone else does, which is to not allow or accept non-ASCII name parts, as per specification
And finally, it turns out the third-party service prohibits such shenanigans anyway.

Non-ASCII characters are not allowed in the local part of email addresses. Period. Punycode is ONLY FOR DOMAINS, not for local parts of email addresses.
However, it is very likely that the IETF adopts a standard that makes internationalized local parts possible. This standard, however, will probably not be based on punycode.

I got bored and was researching this tonight, and apparently this is now codified in the Extended SMTP standard, specifically SMTPUTF8 as per RFC 6531. See http://en.wikipedia.org/wiki/Extended_SMTP#SMTPUTF8
My brief experiment using emoji mailbox names returned the following error when sending via Gmail:
local-part of envelope contains utf8 but remote server did not offer SMTPUTF8
This is the same regardless whether I used the emoji or punycode version of the address.

You can encode sections of mail header fields into different character encodings using a format like the following: =?UTF-8?B?w6HDq8O0?= This allows you to embed things like umlauts but I'm pretty sure it doesn't work for the actual address part.
There's not reason why you cannot use these characters to form your address. RFC5322 defines the characters that may appear in the address part in Section 3.4 and all the characters you use above are valid. However as the other comment added it's all a little fruitless if the mail clients that you are sending to cannot parse this format.
Some SMTP servers might 'accidentally' allow umlauts but since they're not within the supported character ranges I wouldn't risk it.

The only standard way to send non us-ascii characters in the local-part of a email address is through rfc6532 (Internationalized email headers) and rfc6531 (SMTP Extension for SMTPUTF8).
As far as I know there is no standard way to encode non us-ascii chars in a local part of a email address notably:
Puny code is for domain names only, not the local part. But you can have a local part which happens to look like the puny encoding of some string but it should be displayed in it's puny encoded form. If a mail program decides to display it after puny decoding it it's non standard behavior.
The encoded word encoding mechanism mentioned in one of the answers (the =?utf-8?Q?foobar?= thing) is not applicable to the local part of a mail address, only to the display name of a mailbox (which is something different, but related i.e. the thing your mail program might display instead of the mail address).
In the end this means that müler#example.com and xn--mler-0ra#example.com
are two completely unrelated email addresses which just would have
the same meaning if they would have been domains (but they are not
so they can collide).
Theoretically you could hope that by now (2019) all mail servers support
SMTPUTF8 and all client support internationalized mails, but sadly I would
not count on it if it's important.
Btw. it happens that the local part of a email address is the only thing in
the mail standard(s) where you might want to have non us-ascii chars and there
is no way to encode it (as far as I know). All other parts either have encoded word, puny, percent, base64, quoted-printable or some other form of encoding mechanism.

did a few tests.. umlauts in the local part seem to work in certain setups. neither my MUA (claws) nor the outbound relay (exim) nor the receiving MTA (postfix) complained or did any punycode conversion. providers like gmail and hotmail however don't allow the umlauts at all ( tested webmail and direct incoming and outgoing smtp). I didn't find any documentation about this case that suggests punycoding local parts.so, since it's not documented and no one does it there is no clashing problem :-)
conclusion: you probably shouldn't accept umlauts in the local part in the first place and not even try to send an email to those addresses. (if the big players don't do it and it's not documented/supported by RFC, why should you?)

Related

How to check if an email address exists or not without sending an email? [duplicate]

This question already has answers here:
How do I check if an email address is valid without sending anything to it?
(7 answers)
Closed 6 years ago.
I have a bunch of mails ids and I want to check all that mail is valid or not.
How it's possible to check without sending any mail.
Syntax Validation: The most obvious part, people - however - know least about. There's more to email syntax validation than the simple PHP RegEx rule you're using. There's the IETF Standards (all the RFCs), but you'll also have to look at ISP-specific syntax checking, quoted words, domain literals, non-ASCII domains, etc.
Disposable & Free Emails: Next, before you use any server side code to check the given email address, it's recommended to check whether or not you're dealing with disposable emails (e.g. mailinator.com) or free emails (Gmail, Yahoo!, etc.) and act accordingly.
Obvious Typos: Now is the time to check for obvious misspellings and typos. (e.g. user#gnail.com would be corrected to user#gmail.com)
DNS validation, including MC record(s) lookup: Verify the DNS MX-Records for the given domain.
SMTP connection, catch-all check: Now for the meaty part, but also the most risky. Validating email addresses by establishing and then aborting an SMTP connection to the given mail server is still the only way to really find out if a mailbox actually exists. However, if executed in the false way, you will - really quickly - be blacklisted and considered a spammer.
From:
https://www.quora.com/Is-there-a-way-to-check-if-an-e-mail-address-is-valid-without-e-mailing-it
mailboxlayer API - my clear favourite. It's basically free, safe, and comes with each and every validation tool necessary in order to properly validate email addresses.
Kickbox.io - great product, but this will cost you a little more. Advantage: They also offer list-cleaning.
email-validator.net - great, but costly.

Postfix, isolate multiple sites mail headers so if one get's blocked/blacklisted, the others sharing the server don't also get blacklisted

I have a few separate sites on a server with a single IP.
The sites shouldn't ever send spam, but the customers are free to send emails from their sites so I have no way to prevent them from doing so.
What I'd like to do is when sending the emails via postfix, somehow separate the sites in the headers sent out.
Previously i've setup an ip for each but i'm trying to avoid doing this.
I've also found with /etc/postfix/header_checks I can remove headers but not sure if removing specific headers will cause issues?
One thing to consider here is that blacklisting is usually based on IP addresses. Separate headers won't help much there. The reason for this is that (a) it's simple and (b) many spam sending servers have been compromised and taken over by an attacker, using custom mail sending software, so headers don't matter anymore.
Different headers might still have their merit though, as spamfilters will check those. It just won't help if your server's IP gets blacklisted.
I guess rolling out DKIM might help here, it would give you artificial separation of domains using different domain keys for each. There are some good tutorials on the net on how to set it up with OpenDKIM.
A better solution, used by big mail providers like GMX, is to send mail from a separate IP if it looks like its spam. The setup for this is a little complicated, as it requires you to scan outgoing mail with spamassassin (or something similar) and to route mail depending on the respective spam value. Not an easy task. Marking spam as such, without sending it through a separate IP, might enough to convince the other side that you try to prevent spam send from your server, but this really depends on their spam filter.
The way your server identifies itself during an SMTP conversation is through the HELO command. The smtp_helo_name parameters specifies the name used there. One could try to setup a transport mechanism to use a different name for each sender domain. I'm honestly mot sure how difficult that would be.
If you are still set on changing headers: the header_checks tables not only allow to remove headers, but also to alter them via regular expressions.
Use the REPLACE command to do so. Example:
/^(Message-ID:.*)#your-domain.example(.*)/ REPLACE ${1}#other-domain.example${2}
I'd advise against it, though. It provides to little gain for the effort of finding and setting up the right rules.

Position of MIME in the Networking stack

Based on a what I found on the internet, MIME (Multipurpose Internet Mail Extensions, now Internet Media Type (?)) is a way to describe file types (a header used by several protocols).
So, MIME itself is not a protocol, rather an extension used by other protocols, right ?
This means that the extension is used at the application layer by the applications with no protocol doing anything other than carrying the MIME header.
So, if I send a mail with a mp3 attachment, SMTP/other application layer protocol recognizes that this is an mp3 attachment or it is the duty of the application solely to recognize the file? In that sense, MIME cannot be called as an extension to SMTP but rather a feature to be used by applications.
If SMTP does not recognize that this is a different kind of file, how will it properly store it at the mail server ? (e.g. a MPEG video file needs a particular format to be stored, how will mail server store it without giving it any special treatment ? )
Sorry if my questions sound a bit vague but I want to get an idea of how different protocols (especially, SMTP) use MIME.
Thanks for your help.
RFC 822 email was originally purely plain-text, 7-bit US-ASCII. MIME specifies a facility for encapsulating other media types in email containers. It does not specify any changes to SMTP (although e.g. the 8BITMIME ESMTP extension is useful for simplifying transport of MIME messages). Thus, it is an extension of an existing protocol, not a distinct protocol in its own right. This is also demonstrated by the fact that other protocols -- notably, HTTP -- have incorporated (parts of) MIME for tagging of content types and encodings.
An Internet Media Type is only one aspect of what MIME used to codify; the mechanisms for specifying character sets and encodings are still defined in MIME proper.
Traditionally, the mail server simply stores the bare RFC822 message in its message store; it is the responsibility of the mail client to parse and possibly manipulate any MIME structure in the body for display and interaction. (The fact that RFC 822 has been superseded by 2282 and then 5322 has not fundamentally changed the actual mail message format.)
Some servers deviate from this model; for example, Microsoft Exchange seems to parse all incoming messages in order to borg them into its internal format, somewhat to the detriment of its interoperability with standard tools, and the sanity of those few of us who require reliable, felicitous access to our actual email.
The SMTP protocol itself knows nothing about the MIME format, but the SMTP server itself has to at least implement basic rfc0822 support in order to ad the Received headers, however, it does not need to implement MIME.
How does the server save the file to disk? The same way it received it from the client over the TCP/IP stream. It just saves the raw bytes sent (with the addition of the addition of a Received header I mentioned).
In other words, you are way over-thinking this. The SMTP server doesn't have to know anything about mp3 file attachments or anything else because the MIME format (it's not a protocol) is just a way to serialize the mp3 data in a message.

Detect non-existing e-mail address without sending a message

I've read everything I could find on verifying e-mail addresses. The widely encountered solution is this, and it doesn't work (for one, actual nslookup output differs significantly from what the article shows, so I don't get an actual address to telnet to).
But then I thought: I don't need to verify the address. I just want to detect clearly bogus address (such an address that sending a message to it will yield "delivery failed" response). Is it possible to do in principle, and implement using C++ sockets or Java networking API in particular?
Depending on which operating system and tools you use, verifying the recipient's domain, and whether it is recorded in the DNS with a meaningful MX (mail exchange), you could use dig in place of nslookup. For foo#bar.com,
$ dig bar.com MX
Possibilities of detecting bogus eMail adresses are typically limited, though. Availability largely depends on how "generously" the MTA offers this information. Most don't, these days. The SMTP protocol includes some verbs you could then use, such as VRFY. On the other hand, spammers could do just that, hence … (That's one reason why a mail loop is run, in order to detect valid eMails fairly reliably; embedding, as I'm sure you know, a verification string to be sent back, or passed via URL to some web service.
SMTP, being a text protocol, would be used via some "transport layers" underlying higher level APIs like JavaMail. I'd look for programmability of these with the programming language used. Typically, there is some socket library, for sending and retrieving lines of text.

Is it safe to send 8-bit emails?

I would like to know if it is safe to send emails with 8-bit characters or if it is still needed to use quoted-printable or base64 encoding.
The 8BITMIME extension is now 20 years old. Are there SMTP servers or mail clients that still are not 8-bit clean? Is there any impact on email deliverability when sending 8-bit emails?
I did not find any numbers but it looks like it is now quite safe to send emails with 8-bit body. But since the big players like Gmail still encode emails there might be some servers that still are not 8-bit clean.
However while sending an email with an 8-bit body might be safe, sending it with 8-bit headers is not.
RFC 2822 which was the standard until late 2008 prohibited non-ASCII characters in headers.
RFC 6532 proposed a standard for 8-bit headers but it is quite recent (2012) and does not seem widely implemented yet.
So sending unencoded 8-bit emails is currently not safe.
There are still SMTP servers that haven't been updated to support 8BITMIME, so yes, you still need to check for the extension.