How to make an email bot that replies to users not reply to auto-responses and get itself into mail loops - perl

I have a bot that replies to users. But sometimes when my bot sends its reply, the user or their email provider will auto-respond (vacation message, bounce message, error from mailer-daemon, etc). That is then a new message from the user (so my bot thinks) that it in turn replies to. Mail loop!
I'd like my bot to only reply to real emails from real humans. I'm currently filtering out email that admits to being bulk precedence or from a mailing list or has the Auto-Submitted header equal to "auto-replied" or "auto-generated" (see code below). But I imagine there's a more comprehensive or standard way to deal with this. (I'm happy to see solutions in other languages besides Perl.)
NB: Remember to have your own bot declare that it is autoresponding! Include
Auto-Submitted: auto-reply
in the header of your bot's email.
My original code for avoiding mail loops follows. Only reply if realmail returns true.
sub realmail {
my($email) = #_;
$email =~ /\nSubject\:\s*([^\n]*)\n/s;
my $subject = $1;
$email =~ /\nPrecedence\:\s*([^\n]*)\n/s;
my $precedence = $1;
$email =~ /\nAuto-Submitted\:\s*([^\n]*)\n/s;
my $autosub = $1;
return !($precedence =~ /bulk|list|junk/i ||
$autosub =~ /(auto\-replied|auto\-generated)/i ||
$subject =~ /^undelivered mail returned to sender$/i
);
}
(The Subject check is surely unnecessary; I just added these checks one at a time as problems arose and the above now seems to work so I don't want to touch it unless there's something definitively better.)

RFC 3834 provides some guidance for what you should do, but here are some concrete guidelines:
Set your envelope sender to a different email address than your auto-responder so bounces don't feed back into the system.
I always store in a database a key of when an email response was sent from a specific address to another address. Under no circumstance will I ever respond to the same address more than once in a 10 minute period. This alone stopped all loops, but doesn't ensure nice behavior (auto-responses to mailing lists are annoying).
Make sure you add any permutation of header that other people are matching on to stop loops. Here's the list I use:
X-Loop: autoresponder
Auto-Submitted: auto-replied
Precedence: bulk (autoreply)
Here are some header regex's I use to avoid loops and to try to play nice:
/^precedence:\s+(?:bulk|list|junk)/i
/^X-(?:Loop|Mailing-List|BeenThere|Mailman)/i
/^List-/i
/^Auto-Submitted:/i
/^Resent-/i
I also avoid responding if any of these are the envelop senders:
if ($sender eq ""
|| $sender =~ /^(?:request|owner|admin|bounce|bounces)-|-(?:request|owner|admin|bounce|bounces)\#|^(?:mailer-daemon|postmaster|daemon|majordomo|ma
ilman|bounce)\#|(?:listserv|listsrv)/i) {

That really sounds like something that's probably available as a module from CPAN, but I didn't find anything clearly relevant in five minutes of searching. Mail::Lite::Mbox::Processor looks like it might do what you want:
Mail::Lite::Message::Matcher is a
framework for automated mail
processing. For example you have a
mail server and you have a need to
process some types of incoming mail
messages automatically. For example,
you can extract automated
notifications, invoices, alerts etc.
from your mail flow and perform some
tasks based on content of those
messages.
but its docs are sparse enough that it isn't immediately obvious whether it provides those example functions itself or if you have to provide the code to drive them.
In any case, though, if you haven't already checked CPAN, that's where I would start if I wanted to do something like this.

My answer here only deals with bounces which is more straightforward.
Using DSN (Delivery Status Notification) identifier will help you detect a DSN/bounced message. It should go to Return-Path and not Reply-To.
Here's a sample of a typical DSN message. The header information includes the message id, content type has specific values (delivery-status) etc.
Not able to provide you any codes in perl, just my 2 cents of idea.
PS: Do note that not all mail servers or MTA conforms to this, but I guess most do.

There should be a standard way of dealing with this, but the problem is that you'd have to assume that systems that send auto-replies comply to that standard, when most the time, they just don't.
How do you get the address that you reply to? I hope you aren't using the From: header. Check the Reply-to: header first and if that doesn't exist, use the Return-path:.
But whatever you do, you will simply have to keep a log of what you sent to whom and throttle your bot to some sensible value of messages per time.

Related

Mailgun API: Batch Sending vs. Individual Calls

Background
We're building an application that will process & send emails via Mailgun. These are sometimes one-off messages, initiated by a transaction. Some emails, though, will be sent to 30k+ at once.
Eg, a newsletter to all members.
Considerations
Mailgun offers a Batch Sending option with their API. Using "Recipient Variables", you can include dynamic values that are paired with a particular user.
This Batch Sending functionality is limited, however. You cannot send more than 1,000 recipients per request, which means we have to iterate through a recipient list (on our database) for each set of 1,000. Mailgun provides an example of how this might work, using Python (scroll about 2/3 down).
Question
Are there any advantages to batch sending (ie, sending an email to a group of recipients through a single API call, using recipient variables) as opposed to making our own loop, variable substitutions and individual API calls?
I assume this is more taxing on our server, as it would be processing each message itself, instead of just offloading all that data to Mailgun's server for heavy-lifting on their end. But I also like the flexibility & simplicity of handling that on our end and sending a "fully-rendered" message to Mailgun, one at a time, without having to iterate 1k at a time.
Any thoughts on best practices, or considerations we should take into account?
Stumbled onto this today, and felt it provided a pretty good summary/answer for my original question. I wanted to post this as an answer, in case anybody else has this question and hasn't found this Mailgun post. Straight from the horse's mouth, too. The nutshell version:
For PHP, at least, the SDK has a Mailgun class, with a BatchMessage() method. This actually handles the counting of recipients for you, so you can just queue up as many email addresses as you want (ie, more than 1k) and Mailgun will fire off to the API endpoint as needed. Pretty slick!
Here's their original wording, plus a link to the page.
Sending a message with Mailgun PHP SDK + Batch Message:
Batch Message
In addition to Message Builder, we have Batch Message. This class
allows you to build a message and submit a template message in
batches, up to 1,000 recipients per post. The benefit of using this
class is that the recipients tally is monitored and will automatically
submit the message to the endpoint when you've added the 1,000th
recipient. This means you can build your message and begin iterating
through your database. Forget about sending the message, the SDK will
keep track of posting to the API when necessary.
// First, instantiate the SDK with your API credentials and define your domain.
$mgClient = new Mailgun("key-example");
$domain = "example.com";
// Next, instantiate a Message Builder object from the SDK, pass in your sending domain.
$batchMsg = $mgClient->BatchMessage($domain);
// Define the from address.
$batchMsg->setFromAddress("dwight#example.com",
array("first"=>"Dwight", "last" => "Schrute"));
// Define the subject.
$batchMsg->setSubject("Help!");
// Define the body of the message.
$batchMsg->setTextBody("The printer is on fire!");
// Next, let's add a few recipients to the batch job.
$batchMsg->addToRecipient("pam#example.com",
array("first" => "pam", "last" => "Beesly"));
$batchMsg->addToRecipient("jim#example.com",
array("first" => "Jim", "last" => "Halpert"));
$batchMsg->addToRecipient("andy#example.com",
array("first" => "Andy", "last" => "Bernard"));
// ...etc...etc...
// After 1,000 recipeints,
// Batch Message will automatically post your message to the messages endpoint.
// Call finalize() to send any remaining recipients still in the buffer.
$batchMsg->finalize();
The answer of #cdwyer and #nikoshr is very helpful, but bit legacy. Used methods in the example are deprecated. Here is current usage of lib:
$batchMessage = $this->mailgun->messages()->getBatchMessage('mydomain.com');
$batchMessage->setFromAddress('user#domain.com');
$batchMessage->setReplyToAddress('user2#domain.com');
$batchMessage->setSubject('Contact form | Company');
$batchMessage->setHtmlBody('<html>...</html>');
foreach ($recipients as $recipient) {
$batchMessage->addToRecipient($recipient);
}
$batchMessage->finalize();
More info at documentation.

separate email from original email using perl

When people email each other, they generally include the original email in their reply to a sender, adding a little more information each time to the email. Each email client seems to have a different way of adding the original email to a reply.
I need to parse email arriving at our mail server and try and extract the new part of the message, and I'm wondering if there is a sensible way to strip this appended (or prepended) information (the "original message") and just get the new information in a mail body? I believe sadly, that there is no encoding, the original email is simply added to the new message, but I thought I'd check with the experts?
thanks.
No, there is no simple, straightforward algorithm to separate quoted or forwarded text from new content. Quoting and forwarding are poorly standardized and different conventions have existed at different times.
Having said that, e.g. Google's Gmail succeeds fairly well in practice. With enough samples, you can clearly come up with reasonable heuristics.
Good indicators for quoted material are forwarded (pseudo-) headers and indented text, perhaps with a quote indicator along the left margin before the quoted text. You occasionally see outdents as well.
Traditionally, on Usenet in the early 1990s, people would use different, unique quoting styles.
: ~ | This seems to be the original.
: ~ This is the first reply.
: This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Around 1995, both clients and standardization initiatives by and large converged on "wedge" quotes;
> >> This seems to be the original.
> > This is the first reply.
> This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Then along came Microsoft and ruined it all. I suppose that top quoting makes sense in some corporate settings where you quickly need to collect all the background from a thread to a new participant, but even for that purpose it's a horrible abomination.
This is the third reply, quoting the
previous three messages in sequence.
---- Begin forwarded message ----
From: Him [smtp:bogus]
To: His Friend
Subject: VS: Re: Same as on this message
Date: nothing machine-readable
This is the second reply.
---- Alkuperäinen viesti ----
Lähettäjä: His Friend [smtp:poppycock]
Saaja: Some Guy
Aihe: Re: Same as on this message
Päivämäärä: olisiko eilen ehkä
This is the first reply.
----- Original message ----
From: Somebody Else [smtp:mindless]
To: Some Guy
Subject: Same as on this message
Date: like, the day before
This seems to be the original.

Apache Camel mail to identify auto generated messages

I'm looking for a way to identify auto generated messages like Outlook's "Out of office" replies.
I stumbled upon a header called "Auto-submitted" that's supposed to do the trick, but Camel doesn't seems to provide this header in the "Message" object. Reference: http://www.iana.org/assignments/auto-submitted-keywords/auto-submitted-keywords.xml
Is it possible to know if a message is auto generated or human generated?
I don't know Apache Camel, but I can tell you that there is no simple and safe way to detect automated email messages in general. Headers like auto-submitted are an indicator, but unfortunately lots of automated scripts do not add them. I once had to write an out-of-office implementation that should not send ooo replies to any automated messages (mailing lists, spam, newsletters, etc.). Here is what I finally came up with, maybe this helps in your case as well:
Sender address regular expressions that indicate automated senders:
"^owner-"
"^request-"
"-request#"
"bounce.*#"
"-confirm#"
"-errors#"
"^no[-]?reply"
"^donotreply"
"^postmaster#"
"^mailer[-_]daemon#"
"^mailer#"
"^listserv#"
"^majordom[o]?#"
"^mailman#"
"^nobody#"
"^bounce"
"^www(-data)?#"
"^mdaemon#"
"^root#"
"^news(letter)?#"
"^webmaster#" (role address - may not be a good indicator in your case)
"^administrator#" (role address - may not be a good indicator in your case)
"^support#" (role address - may not be a good indicator in your case)
Headers that indicate automated messages if they exist:
list-help
list-unsubscribe
list-subscribe
list-owner
list-post
list-archive
list-id
mailing-List
x-facebook-notify
x-mailing-list
x-cron-env
x-autoresponse
x-eBay-mailtracker
Headers that indicate automated messages if they have a special value:
'x-spam-flag':'yes'
'x-spam-status':'yes'
'X-Spam-Flag2': 'yes'
'precedence':'(bulk|list|junk)'
'x-precedence':'(bulk|list|junk)'
'x-barracuda-spam-status':'yes'
'x-dspam-result':'(spam|bl[ao]cklisted)'
'X-Mailer':'^Mail$'
'auto-submitted':'auto-replied'

What heuristics should I use to prevent an autoresponder war?

I am currently extending an e-mail system with an autoresponse feature. In a dark past, I've seen some awesome mail loops, and I'm now trying to avoid such a thing from happening to me.
I've looked at how other tools ('mailbot', 'vacation') are doing this, grepped my own mail archive for suspicious mail headers, but I wonder if there is something else I can add.
My process at this point:
Refuse if sender address is invalid (this should get rid of messages with <> sender)
Refuse if sender address matches one of the following:
'^root#',
'^hostmaster#',
'^postmaster#',
'^nobody#',
'^www#',
'-request#'
Refuse if one of these headers (after whitespace normalization and lowercasing) is present:
'^precedence: junk$',
'^precedence: bulk$',
'^precedence: list$',
'^list-id:',
'^content-type: multipart/report$',
'^x-autogenerated: reply$',
'^auto-submit: yes$',
'^subject: auto-response$'
Refuse if sender address was already seen by the autoresponder in the recent past.
Refuse if the sender address is my own address :)
Accept and send autoresponse, prepending Auto-response: to the subject, setting headers Precedence: bulk and Auto-Submit: yes to hopefully prevent some remote mailer from propagating the autoresponse any further.
Is there anything I'm missing?
In my research so far I've come up with these rules.
Treat inbound message as autogenerated, ignore it and blacklist the sender if...
Return-Path header is <> or missing/invalid
Auto-Submitted header is present with any value other than "no"
X-Auto-Response-Suppress header is present
In-Reply-To header is missing
Note: If I'm reading RFC3834 correctly, your own programs SHOULD set this, but so far it seems some autoresponders omit this (freshdesk.com)
When sending outbound messages, be sure to...
Set the Auto-Submitted: auto-generated header (or auto-replied as appropriate)
Set your SMTP MAIL FROM: command with the null address <>
Note some delivery services including Amazon SES will set their own value here, so this may not be feasible
Check the recipient against the blacklist built up by the inbound side and abort sending to known autoresponders
Consider sending not more than 1 message per unit time (long like 24 hours) to a given recipient
Notes on other answers and points
I think ignoring Precedence: list messages will cause false positives, at least for my app's configuration
I believe the OP's "auto-submit" rule is a typo and the official header is Auto-Submitted
References
RFC3834
This SO question about Precedence header has several good answers
Wikipedia Email Loop Article
desk.com article
Comments welcome and I'll update this answer as this is a good question and I'd like to see an authoritative answer created.
Update 2014-05-22
To find if an inbound message is an "out-of-office" or other automatic reply, we use that procedure:
First, Find if header "In-Reply-To" is present. If not, that is an auto-reply.
Else, check if 1 of these header is present:
X-Auto-Response-Suppress (any value)
Precedence (value contains bulk, or junk or list)
X-Webmin-Autoreply (value 1)
X-Autogenerated (value Reply)
X-AutoReply (value YES)
Include a phrase like "This is an automatically-generated response" in the body somewhere. If your message body is HTML (not plain text) you can use a style to make it not visible.
Check for this phrase before responding. If it exists, odds are good it's an automated response.

Parse and display MIME multipart email on website

I have a raw email, (MIME multipart), and I want to display this on a website (e.g. in an iframe, with tabs for the HTML part and the plain text part, etc.). Are there any CPAN modules or Template::Toolkit plugins that I can use to help me achieve this?
At the moment, it's looking like I'll have to parse the message with Email::MIME, then iterate over all the parts, and write a handler for all the different mime types.
It's a long shot, but I'm wondering if anyone has done all this already? It's going to be a long and error prone process writing handlers if I attempt it myself.
Thanks for any help.
I actually just dealt with this problem just a few months ago. I added an email feature to the product I work for, both sending and receiving. The first part was sending reminders to users, but we didn't want to manage the bounce backs for our customer admins, we decided to have a message inbox that the admins could see bounces and replies without us, and the admins can deal with adjusting email addresses if they needed to.
Because of this, we accept all email that is sent to an inbox we watch. We use VERP to associate an email with a user, and store the entire email as is in the database. Then, when the admin requests to see the email, we have to parse the email.
My first attempt was very similar to an earlier answer. If one of the parts is html, show it. If it's text, show it. Otherwise, show the original, raw email. This broke down real fast with a few emails not generated by sendmail. Outlook, Exchange, and a few other email systems don't do that, they use multiparts to send the email. After a lot of digging and cussing, I discovered that the problem doesn't appear to be well documented. With the help of looking through MHonArc and reading the RFC's (RFC2045 and RFC2046), I settled on the solution below. I decided on not using MHonArc, since I couldn't easily resuse the parsing and display functionality. I wouldn't say this is perfect, but it's been good enough that we used it.
First, take the message and use Email::MIME to parse it. Then call a function called get_part with the array of parts Email::MIME gives you with ->parts().
get_part, for each part it was passed, decodes the content type, looks it up in a hash, and if it exists, call the function associated with that content type. If the decoder was able to give us something, put it on a result array.
The last piece of the puzzle is this decoder array. Basically, it defines the content types I can deal with:
text/html
text/plain
message/delivery-status, which is actually also plain text
multipart/mixed
multipart/related
multipart/alternative
The non-multipart sections I return as is. With mixed, related and alternative, I merely call get_parts on that MIME node and returns the results. Because alternative is special, it has some extra code after calling get_parts. It will only return html if it has an html part, or it will return only the text part of it has a text part. If it has neither, it won't return anything valid.
The advantage with the hash of valid content types is that I can easily add logic for more parts as needed. And by the time you get_parts is done, you should have an array of all content you care about.
One more item I should mention. As a part of this, we created a separate domain that actually serves these messages. The main domain that an admin works on will refuse to serve the message and redirect the browser to our user content domain. This second domain will only serve user content. This is to help the browser properly sandbox the content away from our main domain. See same origin policy (http://en.wikipedia.org/wiki/Same_origin_policy)
It doesn't sound like a difficult job to me:
use Email::MIME;
my $parsed = Email::MIME->new($message);
my #parts = $parsed->parts; # These will be Email::MIME objects, too.
print <<EOF;
<html><head><title>!</title></head><body>
EOF
for my $part (#parts) {
my $content_type = $parsed->content_type;
if ($content_type eq "text/plain") {
print "<pre>", $part->body (), "</pre>\n";
}
elsif ($content_type eq "text/html") {
print $part->body ();
}
# Handle some more cases here
}
print <<EOF;
</body></html>
EOF
Reuse existing complete software. The MHonArc mail-to-HTML converter has excellent MIME support.