Is root#[127.1] a syntactically valid e-mail address? - email

Is root#[127.1] a syntactically valid e-mail address?
Why? Why not?

You need to check RFC 5322, section 3.4.1.
This specification is a
revision of Request For Comments (RFC) 2822, which itself superseded
Request For Comments (RFC) 822, "Standard for the Format of ARPA
Internet Text Messages", updating it to reflect current practice and
incorporating incremental changes that were specified in other RFCs.
I run your email address though http://isemail.info/ that gave the following info:
The general result is: The address is only valid according to the broad definition of RFC 5322. It is otherwise invalid.
The specific diagnosis is: The domain literal is not a valid RFC 5321 address literal
Here is the relevant passage from the email RFCs:
domain-literal = [CFWS] "[" *([FWS] dtext) [FWS] "]" [CFWS]
(RFC 5322 section 3.4.1)

It depends on whether you mean addresses in the header (RFC 5322) or envelope addresses (RFC 5321), and in the latter case, whether you include <>, everything between <> (i.e. the source route), or just Mailbox.
It's valid according to RFC 5322, but RFC 5322 allows loads of fun things like comments! and unicorns! and cake! and ponies!. It's just about possible to parse them using Perl's "regular" expressions: Mail::RFC822::Address.
It's syntactically invalid according to RFC 5321 Section 4.1.3 since the grammar only allows address literals of the form 1.2.3.4 or with a prefix of the form "standard-tag:" (e.g. [IPv6:::1]). I've assumed you meant "Mailbox", i.e. everything between <> but not including the source route.
I'd use the latter definition, since an e-mail address isn't much good if my SMTP server won't accept it. (Yes, this is a bit of a horrible definition, but I don't think the internet will move away from SMTP any time soon.)
(Additionally, there's RFC 5336 a.k.a. "UTF8SMTP". I'm not aware of anyone who uses this.)

No, RFC2822 allows IP addresses to be used as domain, but you must use a valid IP address.
Your example should be root#[127.0.0.1].

According to RFC-822 as you mention in the tags, yes, it is syntactically valid, because the grammar allows it. These are the relevant rules:
addr-spec = local-part "#" domain ; global address
domain = sub-domain *("." sub-domain)
sub-domain = domain-ref / domain-literal
domain-literal = "[" *(dtext / quoted-pair) "]"
dtext = <any CHAR excluding "[", ; => may be folded
"]", "\" & CR, & including
linear-white-space>

No,
E-mail validity has some broad definitions, if you split the address into two sections, local (before the # sign) and domain (after the # sign). The local part may be alpha-numeric with the following special characters ‘.’, ‘-’ and ‘_’, the local part cannot contain contiguous periods.
The domain part must meet the definition of a host name or ip address surrounded by square braces.
As your example doesn't meet the requirements for a valid host name (foo.bar), and it doesn't contain a valid IP address surrounded by square braces, it is not a valid e-mail address.
Check out the following e-mail validator code (minus the ip address validation bit) which will validate an e-mail address. This can be easily retrofitted to work with ip-address domain names too.

Related

The precise format of Content-Id header

I'm really confused when it comes to the format of Content-Id headers in message parts.
It seems to me that only RFC 2045 covers the format of the header, however briefly:
In constructing a high-level user agent, it may be desirable to allow
one body to make reference to another. Accordingly, bodies may be
labelled using the "Content-ID" header field, which is syntactically
identical to the "Message-ID" header field:
id := "Content-ID" ":" msg-id
Like the Message-ID values, Content-ID values must be generated to be
world-unique.
RFC 2822 explains the format of a msg-id token like so:
The message identifier (msg-id) is similar in syntax to an angle-addr
construct without the internal CFWS.
message-id = "Message-ID:" msg-id CRLF
in-reply-to = "In-Reply-To:" 1*msg-id CRLF
references = "References:" 1*msg-id CRLF
msg-id = [CFWS] "<" id-left "#" id-right ">" [CFWS]
id-left = dot-atom-text / no-fold-quote / obs-id-left
id-right = dot-atom-text / no-fold-literal / obs-id-right
no-fold-quote = DQUOTE *(qtext / quoted-pair) DQUOTE
no-fold-literal = "[" *(dtext / quoted-pair) "]"
Long story short: it includes the at ('#') symbol, just like the Message-Id header of a message. However, almost all reader-friendly articles on MIME format give examples of Content-Id without the at symbol (including not-really-global identifiers like myimagecid or inlineimage001 as well as randomly generated UUIDS without the at symbol). They would surely stress the importance of the '#' symbol if that would be necessary, just like they do with the Message-Id header, right? Right?
I've run some tests on real-world email clients and see how they compose emails with embedded inline images:
Thunderbird generates identifiers with the at symbol. Example: part1.12345678.12345678#domain.example.com
Gmail generates identifiers without such symbol and with no domain part. Example: ii_abc1234x0_12345ab12abcdefa
I didn't test any more email clients (if someone did, it'd be great to complete the list above), but these two already show the striking difference. Google not obeying RFC standards? It sure looks smelly and I want to know whether that's because I missed something, or because the format isn't really that important after all (which in the long run feels rather disturbing). I'm also interested in checking how many popular email clients actually discard the 'at' symbol.
Go by what the spec says, not by what some mail clients do.
So yes, a Content-Id header should have a value that conforms to the way the specification says and therefor should have an '#' symbol.
The world of email is a broken hell hole of many different mail clients and servers doing their own thing and not respecting the standards.
As someone who has written mail software for the past 17 years, I can assure you, this is not the only place that Google deviates from the specs.

RFC 5322 email format validation

How can I check if emails that are generated by my code a valid according to
RFC 5322 ?
Here's a PCRE regular expression (taken from a PHP library) that will validate according to RFC 5322:
'/^(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){255,})(?!(?>(?1)"?(?>\\\[ -~]|[^"])"?(?1)){65,}#)((?>(?>(?>((?>(?>(?>\x0D\x0A)?[\t ])+|(?>[\t ]*\x0D\x0A)?[\t ]+)?)(\((?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-\'*-\[\]-\x7F]|\\\[\x00-\x7F]|(?3)))*(?2)\)))+(?2))|(?2))?)([!#-\'*+\/-9=?^-~-]+|"(?>(?2)(?>[\x01-\x08\x0B\x0C\x0E-!#-\[\]-\x7F]|\\\[\x00-\x7F]))*(?2)")(?>(?1)\.(?1)(?4))*(?1)#(?!(?1)[a-z\d-]{64,})(?1)(?>([a-z\d](?>[a-z\d-]*[a-z\d])?)(?>(?1)\.(?!(?1)[a-z\d-]{64,})(?1)(?5)){0,126}|\[(?:(?>IPv6:(?>([a-f\d]{1,4})(?>:(?6)){7}|(?!(?:.*[a-f\d][:\]]){8,})((?6)(?>:(?6)){0,6})?::(?7)?))|(?>(?>IPv6:(?>(?6)(?>:(?6)){5}:|(?!(?:.*[a-f\d]:){6,})(?8)?::(?>((?6)(?>:(?6)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?9)){3}))\])(?1)$/isD'
Unlike Peter's answer it does allow for single-label domain names (which are syntactically valid) and IPv6 address literals.
However, I'd strongly suggest to instead validate according to RFC 5321 which doesn't allow for comments or folding white space (which are semantically invisible and so not actually a part of the email address) or for obsolete local parts (which can just be re-written as non-obsolete quoted strings):
'/^(?!(?>"?(?>\\\[ -~]|[^"])"?){255,})(?!"?(?>\\\[ -~]|[^"]){65,}"?#)(?>([!#-\'*+\/-9=?^-~-]+)(?>\.(?1))*|"(?>[ !#-\[\]-~]|\\\[ -~])*")#(?!.*[^.]{64,})(?>([a-z\d](?>[a-z\d-]*[a-z\d])?)(?>\.(?2)){0,126}|\[(?:(?>IPv6:(?>([a-f\d]{1,4})(?>:(?3)){7}|(?!(?:.*[a-f\d][:\]]){8,})((?3)(?>:(?3)){0,6})?::(?4)?))|(?>(?>IPv6:(?>(?3)(?>:(?3)){5}:|(?!(?:.*[a-f\d]:){6,})(?5)?::(?>((?3)(?>:(?3)){0,4}):)?))?(25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)(?>\.(?6)){3}))\])$/iD'
Using this regex its like 98% valid. It doesn't validate the following:
postbox#com
admin#mailserver1
user#[IPv6:2001:db8:1ff::a0b:dbd0]
But it covers everything else
^(([^<>()[\\]\\.,;:\\s#\"]+(\\.[^<>()[\\]\\.,;:\\s#\"]+)*)|(\".+\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|(([a-zA-Z\\-0-9]+\\.)+[a-zA-Z]{2,}))$
Note: This is transported directly from some production Golang code so slashes are added.
Email Regex as per RFC 5322 Policy
After so much struggle I made the regex validating all the cases as per 5322 except one:
(1)admin#mailserver1 (local domain name with no TLD, although ICANN highly discourages dot less email addresses)
^(?=.{1,64}#)((?:[A-Za-z0-9!#$%&'*+-/=?^\{\|\}~]+|"(?:\\"|\\\\|[A-Za-z0-9\.!#\$%&'\*\+\-/=\?\^_{|}~ (),:;<>#[].])+")(?:.(?:[A-Za-z0-9!#$%&'*+-/=?^\{\|\}~]+|"(?:\\"|\\\\|[A-Za-z0-9\.!#\$%&'\*\+\-/=\?\^_{|}~ (),:;<>#[].])+")))#(?=.{1,255}.)((?:[A-Za-z0-9]+(?:(?:[A-Za-z0-9-][A-Za-z0-9])?).)+[A-Za-z]{2,})|(((0|[1-9A-Fa-f][0-9A-Fa-f]{0,3}):){0,6}(0|)])$
Please click here to get a clear idea about this regex
https://regex101.com/r/7u0dze/1

Are international characters (e.g. umlaut characters) valid in the local part of email addresses?

Are german umlauts (ä, ö, ü) and the sz-character (ß) valid in the local part of an email-address?
For example take this email-address: björn.nußbaum#trouble.org
RFC 5322 quite clearly says, that umlauts (and other international characters) aren't allowed. If I take a look at chapter 3.4.1, there's the following regarding the local part:
local-part = dot-atom / quoted-string / obs-local-part
So what means dot-atom? It's described in chapter 3.2.3: Well, long story short: Printable US-ASCII characters not including specials
So in the whole RFC 5322 I can't see anything regarding international characters.
Or is RFC 5322 already obsolete? (RFC 822 -> RFC 2822 -> RFC 5322)
Update:
The important point for me is: What's the current standard? International characters allowed or not?
RFC 5322 is marked as DRAFT STANDARD. So I think that's the most recent source to rely on, isn't it?
Efran mentioned, that RFC 5336 allows international characters. But RFC 5336 is marked as EXPERIMENTAL, so that's not interesting for me.
Yes, they are valid characters as long as the mail exchanger responsible for the email address supports the UTF8SMTP extension, discussed in RFC 5336. Beware that just a small portion of the mail exchangers out there supports internationalized email addresses.
Both our email validation component for Microsoft .NET and our REST email validation service, for example, allow UTF8 characters in the local part of an email address but will mark it as invalid if its related mail exchanger does not support the aforementioned extension.
https://www.rfc-editor.org/rfc/rfc5322#section-3.4.1 is your latest standards track reference. Generally it is not advisable to use characters which require quoting due to the outrageously high amount of standards unconformant MTAs out there. Such email are bound to get lost in the long run.
As a friendly advice this table is pretty useful too (from Jochen Topf, titled "Characters in the local part of an email address"): https://www.jochentopf.com/email/chars.html
It looks like rfc6531 replaces 5336 and it is "PROPOSED STANDARD"
https://www.rfc-editor.org/rfc/rfc6531

Can an email address contain international (non-english) characters?

If it's possible, should I accept such emails from users and what problems to expect when I will be sending mails to such addresses?
Officially, per RFC 6532 - Yes.
For a quick explanation, check out wikipedia on the subject.
Update 2015: Use RFC 6532
The experimental 5335 has been Obsoleted by: 6532 and
this later has been set to "Category: Standards Track",
making it the standard.
The Section 3.2 (Syntax Extensions to RFC 5322) has updated most text fields to
include (proper) UTF-8.
The following rules extend the ABNF syntax defined in [RFC5322] and
[RFC5234] in order to allow UTF-8 content.
VCHAR =/ UTF8-non-ascii
ctext =/ UTF8-non-ascii
atext =/ UTF8-non-ascii
qtext =/ UTF8-non-ascii
text =/ UTF8-non-ascii
; note that this upgrades the body to UTF-8
dtext =/ UTF8-non-ascii
The preceding changes mean that the following constructs now
allow UTF-8:
1. Unstructured text, used in header fields like
"Subject:" or "Content-description:".
2. Any construct that uses atoms, including but not limited
to the local parts of addresses and Message-IDs. This
includes addresses in the "for" clauses of "Received:"
header fields.
3. Quoted strings.
4. Domains.
Note that header field names are not on this list; these are still
restricted to ASCII.
Please note the explicit inclusion of Domains.
And the explicit exclusion of header names.
Also Note about NFKC:
The UTF-8 NFKC normalization form SHOULD NOT be used because
it may lose information that is needed to correctly spell
some names in some unusual circumstances.
And Section 3 start:
Also note that messages in this format require the use of the
SMTPUTF8 extension [RFC6531] to be transferred via SMTP.
The problem is that some mail clients (server-tools and / or desktop tools) don't support it and throw an 'invalid email' exception when you try to send a mail to an address which contains umlauts for example.
If you want full support, you could do the trick with converting the email-address parts to "punycode". This allows users to type in their addresses the usual way but you save it the supported-level way.
Example: müller.com » xn--mller-kva.com
Both points to the same thing.
I would assume yes since a number of top level domains already allow non ascii
characters for domains and since the domain is part of an email address, it's
perfectly possible. An example for such a domain would be www.öko.de
short answer: yes
not only in the username but also in the domain name are allowed.
The answer is yes, but they need to be encoded specially.
Look at this. Read the part that refers to email-headers and RFC 2047.
Not yet. The IEEE plans to do this:
H-Online article: IEFT planning internationalised email addresses, here is the RfC: SMTP Extension for Internationalized Email Addresses
Quote from H-Online (as it went down):
The Internet Engineering Task Force (IETF) has published three crucial documents for the standardisation of email address headers
that include symbols outside the ASCII character set. This means that
soon you'll be able to use Chinese characters, French accents, and
German umlauts in email addresses as well as just in the body of the
message. So if your name is Zoë and you work for a company that makes
façades, you might be interested in a new email address. But
representatives of providers are already moaning. They say there would
need to be an "upgrade mania" if the Unicode standard UTF-8 is to
replace the American Standard Code for Information Interchange (ASCII)
currently used as the general email language.
RFC 5335 specifies the use of UTF-8 in practically all email headers.
Changes would have to be made to SMTP clients, SMTP servers, mail user
agents (MUAs), software for mailing lists, gateways to other media,
and everywhere else where email is processed or passed along. RFC 5336
expands the SMTP email transport protocol. At the level of the
protocol, the expansion is labelled UTF8SMTP.
A new header field will be added as a sort of "emergency parachute" to
ensure that UTF-8 emails have a soft landing if they are thrown out
before reaching the recipient by systems that have not been upgraded.
The "OldAddress" is a purely ASCII address. But OldAddress is not to
be used as a channel for a second transfer attempt, but rather to make
sure that feedback is sent home.
Finally, RFC5337 ensures that correct messages are sent pertaining to
the delivery status of non-ASCII emails. The correct address of an
unreachable addressee must be sent back, even if further transport has
been refused. The email Address Internationalization (EAI) working
group is also working on a number of "downgrade mechanisms" for
various header fields and the envelope. If possible, original header
information is to be "packaged" and preserved.
Germany's DeNIC, the registrar for the ".de" domain, is nonetheless
taking this in its stride. "There is really not much we can do",
explained DeNIC spokesperson Klaus Herzig. DeNIC is instead paying
more attention to the update that the IETF is working on for the
standard of international domains – RFC3490, or IDNA2003 as it's
sometimes known. "We are not that happy about it because there is no
backwards compatibility," Herzig explained. When the update comes,
DeNIC says it will be throwing its weight behind the symbol "ß" - also
known as estzett - which has been overlooked up to now. The German
registrar also says that it may wait a bit before switching in light
of the lack of backward compatibility. Once the new standard is
running stably and registrars and providers have adopted it, the ß
will be added.
In contrast, experts believe that Chinese registrars in China and
Taiwan will quickly implement the change for internationalised email.
Representatives of CNIC and TWNIC are authors of the standards.
Chinese users currently have to write emails in ASCII to the left of
the # and in Chinese characters to the right of it for Chinese
domains, which have already been internationalized.
(Monika Ermert)

How do I upper case an email address?

I expect this should be a pretty easy question. It is in two parts:
Are email addresses case sensitive? (i.e. is foo#bar.com different from Foo#bar.com?)
If so, what is the correct locale to use for capitalising an email address? (i.e. capitalising the email tim#foo.com would be different in the US and Turkish locales)
Judging from the specs the first part can be case sensitive, but normally it's not.
Since it's all ASCII you should be safe using a "naive" uppercase function.
Check out the RFC spec part of the wikipedia article on E-mail adresses
If you're in for some heavier reading RFC5322 and RFC5321 should be useful too.
The local-part of the email address (i.e. before the #) is case-sensitive in general. From the Wikipedia entry on E-mail address:
The local-part is case sensitive, so
"jsmith#example.com" and
"JSmith#example.com" may be delivered
to different people. This practice is,
however, discouraged by RFC 5321.
However, only the authoritative
mail-servers for a domain may make
that decision.
For the detailed specifications, you may wish to consult the following RFCs:
RFC 5321: Simple Mail Transfer Protocol
RFC 5322: Internet Message Format
RFC 3696: Application Techniques for Checking and Transformation of Names
domain names are case insensitive.
so foo#BAR.COM is the same email as foo#bar.com
for user names, it depends of the mail server. in the Outlook server my company uses it is also case insensitive
Email address are not case sensitive.
The local-part of the e-mail address
may use any of these ASCII characters:
Uppercase and lowercase English
letters (a-z, A-Z)
Digits 0 through 9
Characters ! # $ % & ' * + - / = ? ^
_ ` { | } ~
Character . provided that it is not
the first nor last character, nor
may it appear two or more times
consecutively.
Source :Wikipedia