Forwarded email detection

Forwarded email detection - email

Is there any way to detect (using RFC 2822 headers) that an email is a forwarded email?

There are two things that are normally referred to as "forwarding".
When you set up automatic account-level forwarding to another email address, your mail system will usually introduce an extra header to enable it to detect and break mail loops. Unfortunately, the name of this header has never been standardized. Some use Delivered-To, some use X-Loop, some use X-Original-To, some use an X-header proprietary to their mail software. But there's no single header field that's present all cases.
When you manually forward a message by clicking the "Forward" button in your mailer and entering a recipient email address and some descriptive text, a new message with a new Message-ID header is generated. The set of headers on this message will be indistinguishable from a normal reply -- In-Reply-To and References are set in exactly the same way. The only difference is that the Subject header will usually start with "Fwd:" or end with "(fwd)". ("Usually" because some clients format it as "[Fwd: <original subject>]" with square brackets around the new subject, some clients localize the prefix Fwd: into their own language, and some users manually edit the Subject before hitting "send".)
So there are good hints that a message is forwarded, but no hard and fast rules.

Reading the spec, CTRL+F for "forward" gives the following header fields:
resent-date = "Resent-Date:" date-time CRLF
resent-from = "Resent-From:" mailbox-list CRLF
resent-sender = "Resent-Sender:" mailbox CRLF
resent-to = "Resent-To:" address-list CRLF
resent-cc = "Resent-Cc:" address-list CRLF
resent-bcc = "Resent-Bcc:" (address-list / [CFWS]) CRLF
resent-msg-id = "Resent-Message-ID:" msg-id CRLF
I'm not sure whether the major mail software uses these though.
EDIT
Read the spec a little too quickly, there is also this note:
Note: Reintroducing a message into the transport system and using
resent fields is a different operation from "forwarding".
"Forwarding" has two meanings: One sense of forwarding is that a mail
reading program can be told by a user to forward a copy of a message
to another person, making the forwarded message the body of the new
message. A forwarded message in this sense does not appear to have
come from the original sender, but is an entirely new message from
the forwarder of the message. On the other hand, forwarding is also
used to mean when a mail transport program gets a message and
forwards it on to a different destination for final delivery. Resent
header fields are not intended for use with either type of
forwarding.
There are no other notices of "forwarding", so there are no header fields that you can use to detect the forward, except for the subject = "Fwd: <msg>" convention.

Related

Is there a URI schema for addressing individual email messages?

When someone loses track of an email that has been sent to them, and brings that to the sender's attention, it is common practice for the sender to simply forward or re-send the original email. I want to know if there is any [semi-]standard way to reference a specific email, such that a mail client could open that email if it has a copy of it. This might be in the form of a URI, or possibly some other form. Such a URI might reference the sender, recipient, date, time, or other headers that [should] remain intact between sender and recipient.

The Message-ID is a globally unique identifier for messages.
Note that the Message-ID header is optional, but recommended:
Though listed as optional in the table in section 3.6, every message SHOULD have a "Message-ID:" field.
RFC 2392 specifies the URI scheme mid (which was already reserved in RFC 1738):
The "mid" scheme uses (a part of) the message-id of an email message to refer to a specific message.
An example from RFC 2392:
previous message, shows how the approach you propose can be used to accomplish ...

Persistence of custom headers within an email thread

I this is probably a strange question, but I thought I'd go ahead and ask. Say, I send an email, using IMAP SMTP, through a special client. This client adds a few custom headers to the email message before sending it on its way. The recipient receives this email and responds to me directly (and maybe CC's a few people as well).
My question is this: Given the above example, would these X-headers persist throughout all the new messages within the thread?
One thing I can think of is the client would be aware of the original email message it sent. All subsequent responses to this email would have a "Reply-To" header whose value equals the "Message-Id" of the previous email. I don't see why I couldn't crawl up these thread of replies until I get to the original message sent by the client, thereby deriving the original custom headers.
Maybe I'm over-thinking this. Any suggestions? :)

A message reply does not necessarily contain anything of the original message. The MUA is likely to suggest a modified (e.g. prepended with "Re:") version of the original subject, and obviously the addresses are utilised for appropriate defaults as well. None of the other content of the message forms part of the reply (unless the sender deliberately includes it, as with quoting or forwarding). Any X- headers that you have in your message will certainly not be included in the reply (unless you have control over that MUA).
However, your plan of tracking the original message is certainly feasible: see Section 3.6.4 of RFC 5322. Every message should (not must) have a Message-ID header, and should have In-Reply-To and References headers when appropriate.
The "Message-ID:" field contains a single unique message identifier. The "References:" and "In-Reply-To:" fields each contain one or more unique message identifiers, optionally separated by [whitespace].
In-Reply-To is mention to identify the message (or messages) that is (are) being replied to, while References identifies the entire thread of conversation. The References header is meant to contain the entire contents of the References header of the message being replied to, so you only need the last message to identify the entire thread.
Note that In-Reply-To and Reply-To are not the same thing (the latter specifies the address that the sender wishes replies to be sent to).
Assuming that you have the original message, then you should be able to use the References header of any reply to identify the original message. Not every MUA will handle References or In-Reply-To correctly, but most will.

As far as I know, there's no reason to think any email client would propagate any header lines it doesn't understand. Most will preserve the subject (usually adding "Re: " if necessary) and derive their "To: " and "Cc: " lines from the previous message's headers, but that's about it. I suppose some (but not all) will generate an "In-Reply-To" line, but that's as far as it goes.
Your idea of having a client crawl back through the thread looking for specific headers sounds like it might be do-able, but you'd have to write your own email client if you want that feature, and you'd still be blocked by the fact that not all email clients preserve message threading in any way.

What heuristics should I use to prevent an autoresponder war?

I am currently extending an e-mail system with an autoresponse feature. In a dark past, I've seen some awesome mail loops, and I'm now trying to avoid such a thing from happening to me.
I've looked at how other tools ('mailbot', 'vacation') are doing this, grepped my own mail archive for suspicious mail headers, but I wonder if there is something else I can add.
My process at this point:
Refuse if sender address is invalid (this should get rid of messages with <> sender)
Refuse if sender address matches one of the following:
'^root#',
'^hostmaster#',
'^postmaster#',
'^nobody#',
'^www#',
'-request#'
Refuse if one of these headers (after whitespace normalization and lowercasing) is present:
'^precedence: junk$',
'^precedence: bulk$',
'^precedence: list$',
'^list-id:',
'^content-type: multipart/report$',
'^x-autogenerated: reply$',
'^auto-submit: yes$',
'^subject: auto-response$'
Refuse if sender address was already seen by the autoresponder in the recent past.
Refuse if the sender address is my own address :)
Accept and send autoresponse, prepending Auto-response: to the subject, setting headers Precedence: bulk and Auto-Submit: yes to hopefully prevent some remote mailer from propagating the autoresponse any further.
Is there anything I'm missing?

In my research so far I've come up with these rules.
Treat inbound message as autogenerated, ignore it and blacklist the sender if...
Return-Path header is <> or missing/invalid
Auto-Submitted header is present with any value other than "no"
X-Auto-Response-Suppress header is present
In-Reply-To header is missing
Note: If I'm reading RFC3834 correctly, your own programs SHOULD set this, but so far it seems some autoresponders omit this (freshdesk.com)
When sending outbound messages, be sure to...
Set the Auto-Submitted: auto-generated header (or auto-replied as appropriate)
Set your SMTP MAIL FROM: command with the null address <>
Note some delivery services including Amazon SES will set their own value here, so this may not be feasible
Check the recipient against the blacklist built up by the inbound side and abort sending to known autoresponders
Consider sending not more than 1 message per unit time (long like 24 hours) to a given recipient
Notes on other answers and points
I think ignoring Precedence: list messages will cause false positives, at least for my app's configuration
I believe the OP's "auto-submit" rule is a typo and the official header is Auto-Submitted
References
RFC3834
This SO question about Precedence header has several good answers
Wikipedia Email Loop Article
desk.com article
Comments welcome and I'll update this answer as this is a good question and I'd like to see an authoritative answer created.

Update 2014-05-22
To find if an inbound message is an "out-of-office" or other automatic reply, we use that procedure:
First, Find if header "In-Reply-To" is present. If not, that is an auto-reply.
Else, check if 1 of these header is present:
X-Auto-Response-Suppress (any value)
Precedence (value contains bulk, or junk or list)
X-Webmin-Autoreply (value 1)
X-Autogenerated (value Reply)
X-AutoReply (value YES)

Include a phrase like "This is an automatically-generated response" in the body somewhere. If your message body is HTML (not plain text) you can use a style to make it not visible.
Check for this phrase before responding. If it exists, odds are good it's an automated response.

How do I extract the SMTP envelope and header with Perl?

What is SMTP Envelope and SMTP header and what is the relationship between those? How do I extract them with Perl?

An SMTP message contains a set of headers such as From, To, CC, Subject and a whole range of other stuff.
An SMTP Envelope is simply the name given to a small set of header prefixed to the standard SMTP message when the message is moved about by the Message Transport Agent (ie. the SMTP server). The most common envelope headers are X-Sender, X-Receiver and Received.
For example Microsofts SMTP Server will add the X-Sender and a series of X-Receiver headers to the top of a message when it drops the message into its Drop folder. There will be one X-Receiver for each post box that matches the domain the Drop folder is for.
Another example is SMTP servers add a Receive: header when it receives a message from another SMTP server. This header gives various details of the exchange. Hence most emails on the tinternet once arrived at the final destination will have a series of Receive headers indicating the SMTP server hops the message took to arrive. Usually servers remove the X-Sender, X-Receiver headers when the message is finally moved to a POP3 mailbox.
Accessing Headers
On the windows platform the only way I've found to access the envelope headers is to simply open and parse the eml file. Its a pretty simple format (name: value CR LF).
Again on the windows platform the main set of message headers and body parts can be accessed using the CDOSYS.dll COM based set of objects. How you would do this on other platforms I don't know. However the header format is quite straight forward as per the envelope headers, its accessing the body parts that would require more creative coding.

The envelope is the addressing information sent to the server during the initial conversation via the "MAIL FROM:" and "RCPT TO:" commands.
The SMTP header is the collection of header lines which are sent after the DATA command is issued.
How you find them is dependant on how/where you're getting the message from, and we'd need a lot more clues to attempt to answer that.

You can actually think of three different things here. There are the directives that were exchanged between the SMTP MTAs (during each hop the message took) ... the headers that were generated by the MUA and headers that were added (or modified) by MTAs along the route that a given message traversed.
The "envelope" refers to the information provided to the MTA (normally the most recent or final destination MTA). The sender includes a set of headers after the DATA directive in the SMTP connection (separated from the body of the message by a blank line ... but double check the RFC if that's specifically supposed to be a CR/LF pair). Note that the local MTA may add additonal headers and might even modify some headers before storing or forwarding the message.
(Normally it should only add Received-by: headers).
Some MTAs are configured to add X-Envelope-To: and/or X-Envelope-From: headers. Some of them will still filter the contents of these headers (for example to prevent leakage of blind copies). (Senario: the original MUA had a BCC: line directory that a number of people be copied on the message with their names all appearing to one another in the CC: headers; for each recipient domain (MX result) the MTA will only issue RCPT TO: for only the subset of addresses for which the host if the appropriate result (its own hub, smarthost, or any valid MX for the target) --- thus any subsets of recipients who share an MX with each other would see leakage in the X-Envelope-To: headers generated by MTAs that were sloppy about the handling of this detail).
Also not that an Envelope-From line would only contain a host/domain name as supplied by the HELO FROM: or EHLO FROM: directives in the SMTP exchange. It cannot be used as a return address, for replies for example.

For Perl email related stuff have a look at the Perl Email Project.

Correct format of an Return-Path header

My application uses sendmail to send outbound email. I set the 'From:' address using the following format:
Fred Dibnah <fred#dibnah.com>
I'm also setting the Reply-To and Return-Path headers using the exact same format.
This seems to work in the vast majority of cases but I have seen at least one instance in which this fails, namely when the name part of the above string contains a period (full stop):
Fred Dibnah, Inc. <fred#dibnah.com>
This fails deep inside the TMail code (I'm using Ruby) but it seems like a perfectly valid thing to do.
My question is, should I actually be setting the Return-Path and Reply-To headers using only the email address as opposed to the above Name + Email format? E.g.
fred#dibnah.com
Thanks.

In a situation like this, it is best to turn to the RFCs.
Upon reading up on your question, it appears as if You shouldn't be setting the Return-Path value ever. The final destination SMTP server is supposed to be setting this value as it transitions the message to your mailbox (http://www.faqs.org/rfcs/rfc2821.html starting at 4.4).
According to http://www.faqs.org/rfcs/rfc2822.html the Reply-To field can have the following formats
local-part "#" domain (fred#dibnah.com for example)
display-name (Fred Dibna for example)
I would recommend using option 1 as it seems to be the most basic, and you will likely have less issues with that format. In choosing option 1, your Reply-To field should look like the following:
Reply-To: fred#dibna.com

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse