What are the parsed components of a mailto URI - email

An e-mail address is a valid URI when encoded using the mailto scheme:
mailto:user#example.com
(See RFC 6068)
But how should that be parsed according to RFC 3986, the standard for Uniform Resource Identifier (URI): Generic Syntax?
Is the user name part of the e-mail address (user of mailto:user#example.com) the user name of the user info part?
Is the host-name part of the e-mail address (example.com of mailto:user#example.com) the host part?

Despite an e-mail address containing a user name and a host name, when encoded in a mailto URI, the e-mail address constitutes part of the path of the URI, and the user part and host part are considered empty. This is because the user, password (if it was present) and host would constitute the authority part of the URI, which must be preceded by "//".
That is, if the 'mailto' URI scheme had mandated mailto://user#example.com rather than mailto:user#example.com, the parsing would be as expected. A 'mailto' is thus, rather strangely, a URN.

Related

Is an email mailto link a valid URL?

According to the URL syntax there are supposed to be slashes after the colon following the protocol. An email link, e.g.
mailto:bla#shoe.com
, however, does not contain these slashes.
Can these addresses be considered valid URLs ?
The URI standard is STD 66 which currently maps to RFC 3986.
The double slash you know from some URIs (e.g., from HTTP URIs like http://example.com/) precedes the authority component, but this authority component is not required by the generic URI syntax (only scheme and path are).
So, the mailto URI scheme is not using the authority component, and therefore there is no // after the scheme component.

Is username#gtld a valid email? i.e. there is no "domain" portion, it is just a TLD for the hostname

So would username#gtld be a valid email? As a practical example google is purchasing the gTLD "gmail". Obviously they can associate A records with that permitting you to just type http://gmail/ to access the site. But, are there any specs that prohibit them from associating MX records with that as well, allowing folks to give out an alternative address username#gmail?
I ask because I want to make sure our email validator is future proof and technically correct.
I think I answered my own question. Section 3.4.1 of rfc5322 which defines a valid email address states:
addr-spec = local-part "#" domain
[...]
domain = dot-atom / domain-literal / obs-domain
[...]
The domain portion identifies the point to which the mail is delivered. In the dot-atom form, this is interpreted as an Internet domain name (either a host name or a mail exchanger name) as described in [RFC1034], [RFC1035], and [RFC1123]. In the domain-literal form, the domain is interpreted as the literal Internet address of the particular host.
"gmail" would be a valid domain and host name and thus someone#gtld is a valid email address.

Is there a URI schema for addressing individual email messages?

When someone loses track of an email that has been sent to them, and brings that to the sender's attention, it is common practice for the sender to simply forward or re-send the original email. I want to know if there is any [semi-]standard way to reference a specific email, such that a mail client could open that email if it has a copy of it. This might be in the form of a URI, or possibly some other form. Such a URI might reference the sender, recipient, date, time, or other headers that [should] remain intact between sender and recipient.
The Message-ID is a globally unique identifier for messages.
Note that the Message-ID header is optional, but recommended:
Though listed as optional in the table in section 3.6, every message SHOULD have a "Message-ID:" field.
RFC 2392 specifies the URI scheme mid (which was already reserved in RFC 1738):
The "mid" scheme uses (a part of) the message-id of an email message to refer to a specific message.
An example from RFC 2392:
previous message, shows how the approach you propose can be used to accomplish ...

IPv6 address as the domain portion of an email address

I'm trying to test a new email validation function I've written, based on this one., but with some minor adjustments.
From a large set of valid and invalid entries, the function finds just one false negative - an address which has an IPv6 address instead of a domain.
user#[IPv6:2001:db8:1ff::a0b:dbd0]
The source is this wikipedia page: Email Addresses
However, System.Net.IPAddress fails to parse IPv6:2001:db8:1ff::a0b:dbd0, and I can't find any references in the RFC4291 to any prefix of IPv6.
Obviously, IPv6:2001:db8:1ff::a0b:dbd0 is not a valid IPv6 address, but is it valid in an email address? Or is wikipedia wrong?
Should the actual email be user#[2001:db8:1ff::a0b:dbd0] Anyone know?
You are right to look at RFC4291 for the IPv6 address format. However, for SMTP (and thus for any other email software handling addresses) you should also look at Address Literals in RFC5321.
The one you want is probably "IPv6-address-literal".
For those still looking for this, the IPv6: prefix tag is required.
https://www.rfc-editor.org/rfc/rfc5321#section-4.1.3
For IPv6 and other forms of addressing that might eventually be standardized, the form consists of a standardized "tag" that identifies the address syntax, a colon, and the address itself ...

Notes Formula Language "#ValidateInternetAddress" Failing to Validate Properly?

We are using the following validation code to check for a valid email address formatting on a web form driving by Lotus Notes:
#If((#ValidateInternetAddress([Address821]; #ThisValue)!=""
| #Contains(#ThisValue; "\"") | #Contains(#ThisValue; "'")
| #Contains(#ThisValue; " ")); "Please include a valid email address."; "");
Currently, if a user enters any of the following inputs, the verification throws the error message:
empty field
" ", ', or / character
the domain portion of the email: "test.com"
only #
However, if a user enters test#test the form validates this as a valid email address format.
Is this format considered to be a valid "Address821" format? Or is the form validating an incorrect format as a valid email address?
Yes, it technically is valid address syntax, both by past and current standards.
The language in the RFC's has evolved over time:
RFC-821: 3.7. DOMAINS
Domains are a recently introduced concept in the ARPA Internet mail
system. The use of domains changes the address space from a flat
global space of simple character string host names to a hierarchically
structured rooted tree of global addresses. The host name is replaced
by a domain and host designator which is a sequence of domain element
strings separated by periods with the understanding that the domain
elements are ordered from the most specific to the most general.
This isn't very precise. It doesn't explicitly say that there must be more than one element in the domain name, but it doesn't explicitly prohibit it either. But this was obsoleted by:
RFC-2821: 2.3.5 Domain
A domain (or domain name) consists of one or more dot-separated
components.
...
The domain name, as described in this document and in [22], is the entire, fully-qualified name (often referred to as an "FQDN"). A domain name that is not in FQDN form is no more than a local alias. Local aliases MUST NOT appear in any SMTP transaction.
This seems to be saying that it's illegal, but actually it isn't saying that. I'll explain below, but first let's have a look at the draft standard that is intended to obsolete 2821, and which clarifies things a great deal:
RFC-5321 2.3.5 Domain Names
A domain name (or often just a "domain") consists of one or more components, separated by dots if more than one appears. In the case of a top-level domain used by itself in an email address, a single string is used without any dots. This makes the requirement, described in more detail below, that only fully-qualified domain names appear in SMTP transactions on the public Internet, particularly important where top-level domains are involved.
...
The domain name, as described in this document and in RFC 1035 [2], is the entire, fully-qualified name (often referred to as an "FQDN"). A domain name that is not in FQDN form is no more than a local alias. Local aliases MUST NOT appear in any SMTP transaction.
What this makes clear is that no dot is required in a domain name, as long as it is a top level domain.
#ValidateInternetAddress cannot reasonably know whether "test" is a valid top level domain. Even if IBM programmed in the list of approved public TLD's (which IMHO would be a bad idea since it can and does change), you can in fact set up a private TLD called "test" in your own DNS. That's not the same thing as a "local alias" which the standard does prohibit. There's no rule against actual TLDs.
And for that matter, it could even be a public TLD. Theoretically, the owner of a TLD could set up a mail server for the TLD. I.e., President#US, or Queen#UK. Not likely, but possible in those cases, but with all the new TLD's coming on line, I wouldn't be surprised if some of the registrars are using info#domain.
I guess theoretically #ValidateInternetAddress could make the DNS call to check whether it can resolve "test" as a TLD, but the doc for that function only says that it checks the syntax of the address, and the existence of the TLD is a semantic issue, not a syntax issue.