Cleaning bounce mail - email

I'm wondering whether cleaning the bounce mail is done by parsing the returned message text and creating rules (if contains bla bla bla) or by extracting a certain error code number.
Please clarify.

Bounces are handled using VERP, i.e. a specially formed reverse-path. The reverse-path address is the one where to bounce will be sent, and it contains the original recipient in encoded form and maybe other informations too. Something like: bounce-john+example.com#list.example.org. See Variable envelope return path.
The analysis of the bounce mail is good for retrieving the exact cause of the bounce, especially if the bounce is sent in the standard DSN format, but otherwise it is not a reliable method (for example not all mail servers use DSN).

Related

separate email from original email using perl

When people email each other, they generally include the original email in their reply to a sender, adding a little more information each time to the email. Each email client seems to have a different way of adding the original email to a reply.
I need to parse email arriving at our mail server and try and extract the new part of the message, and I'm wondering if there is a sensible way to strip this appended (or prepended) information (the "original message") and just get the new information in a mail body? I believe sadly, that there is no encoding, the original email is simply added to the new message, but I thought I'd check with the experts?
thanks.
No, there is no simple, straightforward algorithm to separate quoted or forwarded text from new content. Quoting and forwarding are poorly standardized and different conventions have existed at different times.
Having said that, e.g. Google's Gmail succeeds fairly well in practice. With enough samples, you can clearly come up with reasonable heuristics.
Good indicators for quoted material are forwarded (pseudo-) headers and indented text, perhaps with a quote indicator along the left margin before the quoted text. You occasionally see outdents as well.
Traditionally, on Usenet in the early 1990s, people would use different, unique quoting styles.
: ~ | This seems to be the original.
: ~ This is the first reply.
: This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Around 1995, both clients and standardization initiatives by and large converged on "wedge" quotes;
> >> This seems to be the original.
> > This is the first reply.
> This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Then along came Microsoft and ruined it all. I suppose that top quoting makes sense in some corporate settings where you quickly need to collect all the background from a thread to a new participant, but even for that purpose it's a horrible abomination.
This is the third reply, quoting the
previous three messages in sequence.
---- Begin forwarded message ----
From: Him [smtp:bogus]
To: His Friend
Subject: VS: Re: Same as on this message
Date: nothing machine-readable
This is the second reply.
---- Alkuperäinen viesti ----
Lähettäjä: His Friend [smtp:poppycock]
Saaja: Some Guy
Aihe: Re: Same as on this message
Päivämäärä: olisiko eilen ehkä
This is the first reply.
----- Original message ----
From: Somebody Else [smtp:mindless]
To: Some Guy
Subject: Same as on this message
Date: like, the day before
This seems to be the original.

how to notice if a mail is a forwarded mail?

I have a very special problem.
If we create a mail in Outlook, we add a UserProperty which contains a DataBase-ID of our System, so we can Link the mail to the representing DataBase-Item. On the service which reads the mails in each Mailbox and imports them automatically I can read this property by using ExtendedPropertyDefinitions. So far everything is fine...
If the User now forwards the message in Outlook, Olk copies the UserProperty to the new message. And now my problems beginn. Now my Service thinks the new message is also linked to our database and updates DB-Entry with the new Body and new Subject.
So does anyone now how to find out if a message is a forwarded one or how to tell Outlook not to copy the userproperty to the forwarded (new) message?
thx. Jay
What we thought about, but isnt working for our case
- a second userproperty containing a simple tag linke "fromSystem". Cause this would be copied too.
- a second userproperty containing a hashsum calculated from subject and Body. Cause both could be changed by the user. We just create the message, add all properties and Display it. from this Point on we no longer have control what is Happening to the mail until the Service handles it.
Your service consuming EWS should check the ConversationIndex and only update the database if it's 22 bytes long (original source message). Forward emails and reply emails keep appending 5 bytes (10 chars) to the ConversationIndex extending it beyond 22 bytes.
Sample ConversationIndexes
Original: 01CDD15D80E51C1D4522172840ACA96287DA28A15D97
Reply: 01CDD15D80E51C1D4522172840ACA96287DA28A15D970000018630
Forward: 01CDD15D80E51C1D4522172840ACA96287DA28A15D970000018630000000FC30
ConversationIndex represents the sequential ordering of the ConversationTopic (essentially GUID + timestamp). See Working with Conversations on MSDN. ConversationIndex is explicitly defined on MSDN here.
if (message.ConversationIndex.Length == 22)
{
// update DB body, subject, etc.
}
Also make sure you load the EmailMessageSchema.ConversationIndex before trying to access its value.

Persistence of custom headers within an email thread

I this is probably a strange question, but I thought I'd go ahead and ask. Say, I send an email, using IMAP SMTP, through a special client. This client adds a few custom headers to the email message before sending it on its way. The recipient receives this email and responds to me directly (and maybe CC's a few people as well).
My question is this: Given the above example, would these X-headers persist throughout all the new messages within the thread?
One thing I can think of is the client would be aware of the original email message it sent. All subsequent responses to this email would have a "Reply-To" header whose value equals the "Message-Id" of the previous email. I don't see why I couldn't crawl up these thread of replies until I get to the original message sent by the client, thereby deriving the original custom headers.
Maybe I'm over-thinking this. Any suggestions? :)
A message reply does not necessarily contain anything of the original message. The MUA is likely to suggest a modified (e.g. prepended with "Re:") version of the original subject, and obviously the addresses are utilised for appropriate defaults as well. None of the other content of the message forms part of the reply (unless the sender deliberately includes it, as with quoting or forwarding). Any X- headers that you have in your message will certainly not be included in the reply (unless you have control over that MUA).
However, your plan of tracking the original message is certainly feasible: see Section 3.6.4 of RFC 5322. Every message should (not must) have a Message-ID header, and should have In-Reply-To and References headers when appropriate.
The "Message-ID:" field contains a single unique message identifier. The "References:" and "In-Reply-To:" fields each contain one or more unique message identifiers, optionally separated by [whitespace].
In-Reply-To is mention to identify the message (or messages) that is (are) being replied to, while References identifies the entire thread of conversation. The References header is meant to contain the entire contents of the References header of the message being replied to, so you only need the last message to identify the entire thread.
Note that In-Reply-To and Reply-To are not the same thing (the latter specifies the address that the sender wishes replies to be sent to).
Assuming that you have the original message, then you should be able to use the References header of any reply to identify the original message. Not every MUA will handle References or In-Reply-To correctly, but most will.
As far as I know, there's no reason to think any email client would propagate any header lines it doesn't understand. Most will preserve the subject (usually adding "Re: " if necessary) and derive their "To: " and "Cc: " lines from the previous message's headers, but that's about it. I suppose some (but not all) will generate an "In-Reply-To" line, but that's as far as it goes.
Your idea of having a client crawl back through the thread looking for specific headers sounds like it might be do-able, but you'd have to write your own email client if you want that feature, and you'd still be blocked by the fact that not all email clients preserve message threading in any way.

What heuristics should I use to prevent an autoresponder war?

I am currently extending an e-mail system with an autoresponse feature. In a dark past, I've seen some awesome mail loops, and I'm now trying to avoid such a thing from happening to me.
I've looked at how other tools ('mailbot', 'vacation') are doing this, grepped my own mail archive for suspicious mail headers, but I wonder if there is something else I can add.
My process at this point:
Refuse if sender address is invalid (this should get rid of messages with <> sender)
Refuse if sender address matches one of the following:
'^root#',
'^hostmaster#',
'^postmaster#',
'^nobody#',
'^www#',
'-request#'
Refuse if one of these headers (after whitespace normalization and lowercasing) is present:
'^precedence: junk$',
'^precedence: bulk$',
'^precedence: list$',
'^list-id:',
'^content-type: multipart/report$',
'^x-autogenerated: reply$',
'^auto-submit: yes$',
'^subject: auto-response$'
Refuse if sender address was already seen by the autoresponder in the recent past.
Refuse if the sender address is my own address :)
Accept and send autoresponse, prepending Auto-response: to the subject, setting headers Precedence: bulk and Auto-Submit: yes to hopefully prevent some remote mailer from propagating the autoresponse any further.
Is there anything I'm missing?
In my research so far I've come up with these rules.
Treat inbound message as autogenerated, ignore it and blacklist the sender if...
Return-Path header is <> or missing/invalid
Auto-Submitted header is present with any value other than "no"
X-Auto-Response-Suppress header is present
In-Reply-To header is missing
Note: If I'm reading RFC3834 correctly, your own programs SHOULD set this, but so far it seems some autoresponders omit this (freshdesk.com)
When sending outbound messages, be sure to...
Set the Auto-Submitted: auto-generated header (or auto-replied as appropriate)
Set your SMTP MAIL FROM: command with the null address <>
Note some delivery services including Amazon SES will set their own value here, so this may not be feasible
Check the recipient against the blacklist built up by the inbound side and abort sending to known autoresponders
Consider sending not more than 1 message per unit time (long like 24 hours) to a given recipient
Notes on other answers and points
I think ignoring Precedence: list messages will cause false positives, at least for my app's configuration
I believe the OP's "auto-submit" rule is a typo and the official header is Auto-Submitted
References
RFC3834
This SO question about Precedence header has several good answers
Wikipedia Email Loop Article
desk.com article
Comments welcome and I'll update this answer as this is a good question and I'd like to see an authoritative answer created.
Update 2014-05-22
To find if an inbound message is an "out-of-office" or other automatic reply, we use that procedure:
First, Find if header "In-Reply-To" is present. If not, that is an auto-reply.
Else, check if 1 of these header is present:
X-Auto-Response-Suppress (any value)
Precedence (value contains bulk, or junk or list)
X-Webmin-Autoreply (value 1)
X-Autogenerated (value Reply)
X-AutoReply (value YES)
Include a phrase like "This is an automatically-generated response" in the body somewhere. If your message body is HTML (not plain text) you can use a style to make it not visible.
Check for this phrase before responding. If it exists, odds are good it's an automated response.

How to make an email bot that replies to users not reply to auto-responses and get itself into mail loops

I have a bot that replies to users. But sometimes when my bot sends its reply, the user or their email provider will auto-respond (vacation message, bounce message, error from mailer-daemon, etc). That is then a new message from the user (so my bot thinks) that it in turn replies to. Mail loop!
I'd like my bot to only reply to real emails from real humans. I'm currently filtering out email that admits to being bulk precedence or from a mailing list or has the Auto-Submitted header equal to "auto-replied" or "auto-generated" (see code below). But I imagine there's a more comprehensive or standard way to deal with this. (I'm happy to see solutions in other languages besides Perl.)
NB: Remember to have your own bot declare that it is autoresponding! Include
Auto-Submitted: auto-reply
in the header of your bot's email.
My original code for avoiding mail loops follows. Only reply if realmail returns true.
sub realmail {
my($email) = #_;
$email =~ /\nSubject\:\s*([^\n]*)\n/s;
my $subject = $1;
$email =~ /\nPrecedence\:\s*([^\n]*)\n/s;
my $precedence = $1;
$email =~ /\nAuto-Submitted\:\s*([^\n]*)\n/s;
my $autosub = $1;
return !($precedence =~ /bulk|list|junk/i ||
$autosub =~ /(auto\-replied|auto\-generated)/i ||
$subject =~ /^undelivered mail returned to sender$/i
);
}
(The Subject check is surely unnecessary; I just added these checks one at a time as problems arose and the above now seems to work so I don't want to touch it unless there's something definitively better.)
RFC 3834 provides some guidance for what you should do, but here are some concrete guidelines:
Set your envelope sender to a different email address than your auto-responder so bounces don't feed back into the system.
I always store in a database a key of when an email response was sent from a specific address to another address. Under no circumstance will I ever respond to the same address more than once in a 10 minute period. This alone stopped all loops, but doesn't ensure nice behavior (auto-responses to mailing lists are annoying).
Make sure you add any permutation of header that other people are matching on to stop loops. Here's the list I use:
X-Loop: autoresponder
Auto-Submitted: auto-replied
Precedence: bulk (autoreply)
Here are some header regex's I use to avoid loops and to try to play nice:
/^precedence:\s+(?:bulk|list|junk)/i
/^X-(?:Loop|Mailing-List|BeenThere|Mailman)/i
/^List-/i
/^Auto-Submitted:/i
/^Resent-/i
I also avoid responding if any of these are the envelop senders:
if ($sender eq ""
|| $sender =~ /^(?:request|owner|admin|bounce|bounces)-|-(?:request|owner|admin|bounce|bounces)\#|^(?:mailer-daemon|postmaster|daemon|majordomo|ma
ilman|bounce)\#|(?:listserv|listsrv)/i) {
That really sounds like something that's probably available as a module from CPAN, but I didn't find anything clearly relevant in five minutes of searching. Mail::Lite::Mbox::Processor looks like it might do what you want:
Mail::Lite::Message::Matcher is a
framework for automated mail
processing. For example you have a
mail server and you have a need to
process some types of incoming mail
messages automatically. For example,
you can extract automated
notifications, invoices, alerts etc.
from your mail flow and perform some
tasks based on content of those
messages.
but its docs are sparse enough that it isn't immediately obvious whether it provides those example functions itself or if you have to provide the code to drive them.
In any case, though, if you haven't already checked CPAN, that's where I would start if I wanted to do something like this.
My answer here only deals with bounces which is more straightforward.
Using DSN (Delivery Status Notification) identifier will help you detect a DSN/bounced message. It should go to Return-Path and not Reply-To.
Here's a sample of a typical DSN message. The header information includes the message id, content type has specific values (delivery-status) etc.
Not able to provide you any codes in perl, just my 2 cents of idea.
PS: Do note that not all mail servers or MTA conforms to this, but I guess most do.
There should be a standard way of dealing with this, but the problem is that you'd have to assume that systems that send auto-replies comply to that standard, when most the time, they just don't.
How do you get the address that you reply to? I hope you aren't using the From: header. Check the Reply-to: header first and if that doesn't exist, use the Return-path:.
But whatever you do, you will simply have to keep a log of what you sent to whom and throttle your bot to some sensible value of messages per time.