The following is the beginning of an SMTP transaction example shown in the
textbook Computer Networking (6th international edition):
S: 220 hamburger.edu
C: HELO crepes.fr
S: 250 Hello crepes.fr, pleased to meet you
The S: prefix indicates that it is a line sent by the server, C: that it is
a line sent by the client. The Wikipedia page SMTP has an SMTP
example that has a similar response to HELO.
Is the server's response to HELO spec compliant? RFC 5321
specifies the server's response to HELO/EHLO thus:
ehlo-ok-rsp = ( "250" SP Domain [ SP ehlo-greet ] CRLF )
/ ( "250-" Domain [ SP ehlo-greet ] CRLF
*( "250-" ehlo-line CRLF )
"250" SP ehlo-line CRLF )
As I understand the spec, the server's response in the above example should be
250 hamburger.edu
That is, it should respond with 250 followed by its own hostname, not the
client's hostname, and certainly not the arbitrary greeting message shown
in the example.
What is the proper response to HELO?
Is the Computer Networking example incorrect?
The short answer
The example is not valid due to the stray Hello just after the 250. But.. that probably doesn't matter.
The long(er) answer
First let's look at the syntax for the client's "HELO":
"HELO" SP Domain CRLF
The client sends HELO followed by a space, followed by his domain name, followed by a CRLF. How do I know that it's the clients domain? Well:
The argument clause contains the fully-qualified domain name of the
SMTP client, if one is available
Now the response:
ehlo-ok-rsp = ( "250" SP Domain [ SP ehlo-greet ] CRLF )
/ ( "250-" Domain [ SP ehlo-greet ] CRLF
*( "250-" ehlo-line CRLF )
"250" SP ehlo-line CRLF )
There are two options here:
"250" SP Domain [ SP ehlo-greet ] CRLF
( "250-" Domain [ SP ehlo-greet ] ...
Both of them don't fit since the domain name is expected instead of the Hello we see.
The following ehelo-greet part is fine though. It is explained on the next page of the RFC:
ehlo-greet = 1*(%d0-9 / %d11-12 / %d14-127)
; string of any characters other than CR or LF
So it can be any string that doesn't contain \r or \n. The string , pleased to meet you clearly falls into that category.
Why it doesn't matter
Having said all of that, note that the fact that the example is invalid, doesn't mean that it doesn't happen in practice or that a server that would send such an example wouldn't work properly. There's a difference between theory and practice. For instance, if we look at Java's official JavaMail, in SMTPTransport.java, on lines 1662-1665 we find the following code:
if (first) { // skip first line which is the greeting
first = false;
continue;
}
This is from a method named ehlo(String domain) which handles sending and receiving the HELO command. JavaMail skips the entire greeting line, and I suspect that other client could be doing the same.
RFC 821 didn't actually specify what the response to HELO is, but all the examples used the recipient's domain as the body of the message. Later RFCs have defined extended hello (EHLO) which actually has a defined format to the response. But given the lack of clarity in the original standard, practically speaking anything between the 250 and the CR is kosher when the client is so careless as to send HELO instead of EHLO.
Related
In RFC 821, it says that a reset (RSET) command can be sent after a DATA command and some mail data has been sent:
However, what distinguishes between a mail client sending an RSET command after DATA, and a mail that contains the word "RSET" on a line by itself?
I've checked RFC 5321 as well and I can't see anything that would mitigate or escape this. It does talk about escaping a mail line which starts with a ".", but not "RSET".
The client cannot terminate the mail data transfer with a period on a line by itself or the server will send the partial mail it has been given.
I imagine there's something I've missed in the RFCs, otherwise I can't help thinking that there's either an SMTP command injection attack vector in many implementations, or no-one can ever send a mail with "RSET" on a line by itself (I think people would have noticed).
The keyword here is after I believe. The DATA command is in progress until it is finished with a lone . on a line.
RFC 5321 § 4.1.1.5 (RSET) states "any stored sender, recipients, and mail data MUST be discarded." This refers to the MAIL FROM, RCPT TO, and presumably DATA commands.
However, upon receiving the . following DATA, the message "MUST" be delivered (which may result in a failure but not a partial failure, see § 4.1.1.4). This clears the buffer of everything RSET is supposed to do.
This means RSET merely elicits a 250 OK response from the receiving server (a keep-alive, much like NOOP) and confirms to the sender that there is indeed no saved sender or recipient queued for the next message.
I do not know of a way to interrupt a DATA command to issue a RSET. The only way I know of to do that is to terminate the connection and establish a new one—and, just to be safe in the case of some odd resumption capability, I'd issue an RSET right after the EHLO or HELO (which the spec says is a NOOP). If there were such a way, it should be in RFC 5321 § 4.1.1.4, § 4.1.1.5, and/or § 3.3.
I recently received an email containing the following chunk (don't click!):
<A HrEf="/#/0X0a290d92b/UALI=28389-UI=176738575-OI=279-ONI=5477-SI=0-CI=0-BI=577-II=27913-IDSP=1-KLEM=11-TIE=A-IDE=276135-MID=572-FID=0-DIOM=0" sTyLe=color:#000;font-size:10px;font-family:arial;>
<span>UNS</span></a>
Here is a link to the raw email: https://gist.github.com/anonymous/16963a230cab0a3a1bcfc81209f297f1
As far as I know, /# is not a valid url. How is my browser able to resolve it to a site?
As it was already mentioned in comments # is allowed in URL paths.
Regarding URL resolving. I guess that attacker uses <base> tag to explicitly set default URL for all relative links in email body and hopes that your browser/email client will resolve it for you.
UPDATE
The original guess might be correct since it is not supported by majority of mail clients
After a bit of investigation I realized that 0x0A290D92B is actually is hex-encoded IPv4 address 162.144.217.43. The only thing which I do not yet understood is how it is supposed to be transformed to http(s)://0x0A290D92B in browser. It seems like the attacker is targeting specific browser/mail client behavior.
It's treating everything before the # as auth information that gets passed to the URL. The "real" url starts after the #, which is the encoded IP address that vsminkov mentioned. So the leading forward slash is discarded.
An easier to read example: http://username:password#example.com/
It's all just layers of obfuscation.
Here's an interesting link that goes over it in more detail:
http://www.pc-help.org/obscure.htm
and here's RFC 2396 describing that part of the URL:
URL schemes that involve the direct use of an IP-based protocol to a
specified server on the Internet use a common syntax for the server
component of the URI's scheme-specific data:
<userinfo>#<host>:<port>
where may consist of a user name and, optionally, scheme-
specific information about how to gain authorization to access the
server. The parts "#" and ":" may be omitted.
server = [ [ userinfo "#" ] hostport ]
The user information, if present, is followed by a commercial at-sign
"#".
userinfo = *( unreserved | escaped |
";" | ":" | "&" | "=" | "+" | "$" | "," )
So would username#gtld be a valid email? As a practical example google is purchasing the gTLD "gmail". Obviously they can associate A records with that permitting you to just type http://gmail/ to access the site. But, are there any specs that prohibit them from associating MX records with that as well, allowing folks to give out an alternative address username#gmail?
I ask because I want to make sure our email validator is future proof and technically correct.
I think I answered my own question. Section 3.4.1 of rfc5322 which defines a valid email address states:
addr-spec = local-part "#" domain
[...]
domain = dot-atom / domain-literal / obs-domain
[...]
The domain portion identifies the point to which the mail is delivered. In the dot-atom form, this is interpreted as an Internet domain name (either a host name or a mail exchanger name) as described in [RFC1034], [RFC1035], and [RFC1123]. In the domain-literal form, the domain is interpreted as the literal Internet address of the particular host.
"gmail" would be a valid domain and host name and thus someone#gtld is a valid email address.
Is there any way to detect (using RFC 2822 headers) that an email is a forwarded email?
There are two things that are normally referred to as "forwarding".
When you set up automatic account-level forwarding to another email address, your mail system will usually introduce an extra header to enable it to detect and break mail loops. Unfortunately, the name of this header has never been standardized. Some use Delivered-To, some use X-Loop, some use X-Original-To, some use an X-header proprietary to their mail software. But there's no single header field that's present all cases.
When you manually forward a message by clicking the "Forward" button in your mailer and entering a recipient email address and some descriptive text, a new message with a new Message-ID header is generated. The set of headers on this message will be indistinguishable from a normal reply -- In-Reply-To and References are set in exactly the same way. The only difference is that the Subject header will usually start with "Fwd:" or end with "(fwd)". ("Usually" because some clients format it as "[Fwd: <original subject>]" with square brackets around the new subject, some clients localize the prefix Fwd: into their own language, and some users manually edit the Subject before hitting "send".)
So there are good hints that a message is forwarded, but no hard and fast rules.
Reading the spec, CTRL+F for "forward" gives the following header fields:
resent-date = "Resent-Date:" date-time CRLF
resent-from = "Resent-From:" mailbox-list CRLF
resent-sender = "Resent-Sender:" mailbox CRLF
resent-to = "Resent-To:" address-list CRLF
resent-cc = "Resent-Cc:" address-list CRLF
resent-bcc = "Resent-Bcc:" (address-list / [CFWS]) CRLF
resent-msg-id = "Resent-Message-ID:" msg-id CRLF
I'm not sure whether the major mail software uses these though.
EDIT
Read the spec a little too quickly, there is also this note:
Note: Reintroducing a message into the transport system and using
resent fields is a different operation from "forwarding".
"Forwarding" has two meanings: One sense of forwarding is that a mail
reading program can be told by a user to forward a copy of a message
to another person, making the forwarded message the body of the new
message. A forwarded message in this sense does not appear to have
come from the original sender, but is an entirely new message from
the forwarder of the message. On the other hand, forwarding is also
used to mean when a mail transport program gets a message and
forwards it on to a different destination for final delivery. Resent
header fields are not intended for use with either type of
forwarding.
There are no other notices of "forwarding", so there are no header fields that you can use to detect the forward, except for the subject = "Fwd: <msg>" convention.
What is SMTP Envelope and SMTP header and what is the relationship between those? How do I extract them with Perl?
An SMTP message contains a set of headers such as From, To, CC, Subject and a whole range of other stuff.
An SMTP Envelope is simply the name given to a small set of header prefixed to the standard SMTP message when the message is moved about by the Message Transport Agent (ie. the SMTP server). The most common envelope headers are X-Sender, X-Receiver and Received.
For example Microsofts SMTP Server will add the X-Sender and a series of X-Receiver headers to the top of a message when it drops the message into its Drop folder. There will be one X-Receiver for each post box that matches the domain the Drop folder is for.
Another example is SMTP servers add a Receive: header when it receives a message from another SMTP server. This header gives various details of the exchange. Hence most emails on the tinternet once arrived at the final destination will have a series of Receive headers indicating the SMTP server hops the message took to arrive. Usually servers remove the X-Sender, X-Receiver headers when the message is finally moved to a POP3 mailbox.
Accessing Headers
On the windows platform the only way I've found to access the envelope headers is to simply open and parse the eml file. Its a pretty simple format (name: value CR LF).
Again on the windows platform the main set of message headers and body parts can be accessed using the CDOSYS.dll COM based set of objects. How you would do this on other platforms I don't know. However the header format is quite straight forward as per the envelope headers, its accessing the body parts that would require more creative coding.
The envelope is the addressing information sent to the server during the initial conversation via the "MAIL FROM:" and "RCPT TO:" commands.
The SMTP header is the collection of header lines which are sent after the DATA command is issued.
How you find them is dependant on how/where you're getting the message from, and we'd need a lot more clues to attempt to answer that.
You can actually think of three different things here. There are the directives that were exchanged between the SMTP MTAs (during each hop the message took) ... the headers that were generated by the MUA and headers that were added (or modified) by MTAs along the route that a given message traversed.
The "envelope" refers to the information provided to the MTA (normally the most recent or final destination MTA). The sender includes a set of headers after the DATA directive in the SMTP connection (separated from the body of the message by a blank line ... but double check the RFC if that's specifically supposed to be a CR/LF pair). Note that the local MTA may add additonal headers and might even modify some headers before storing or forwarding the message.
(Normally it should only add Received-by: headers).
Some MTAs are configured to add X-Envelope-To: and/or X-Envelope-From: headers. Some of them will still filter the contents of these headers (for example to prevent leakage of blind copies). (Senario: the original MUA had a BCC: line directory that a number of people be copied on the message with their names all appearing to one another in the CC: headers; for each recipient domain (MX result) the MTA will only issue RCPT TO: for only the subset of addresses for which the host if the appropriate result (its own hub, smarthost, or any valid MX for the target) --- thus any subsets of recipients who share an MX with each other would see leakage in the X-Envelope-To: headers generated by MTAs that were sloppy about the handling of this detail).
Also not that an Envelope-From line would only contain a host/domain name as supplied by the HELO FROM: or EHLO FROM: directives in the SMTP exchange. It cannot be used as a return address, for replies for example.
For Perl email related stuff have a look at the Perl Email Project.