Apache Camel mail to identify auto generated messages - email

I'm looking for a way to identify auto generated messages like Outlook's "Out of office" replies.
I stumbled upon a header called "Auto-submitted" that's supposed to do the trick, but Camel doesn't seems to provide this header in the "Message" object. Reference: http://www.iana.org/assignments/auto-submitted-keywords/auto-submitted-keywords.xml
Is it possible to know if a message is auto generated or human generated?

I don't know Apache Camel, but I can tell you that there is no simple and safe way to detect automated email messages in general. Headers like auto-submitted are an indicator, but unfortunately lots of automated scripts do not add them. I once had to write an out-of-office implementation that should not send ooo replies to any automated messages (mailing lists, spam, newsletters, etc.). Here is what I finally came up with, maybe this helps in your case as well:
Sender address regular expressions that indicate automated senders:
"^owner-"
"^request-"
"-request#"
"bounce.*#"
"-confirm#"
"-errors#"
"^no[-]?reply"
"^donotreply"
"^postmaster#"
"^mailer[-_]daemon#"
"^mailer#"
"^listserv#"
"^majordom[o]?#"
"^mailman#"
"^nobody#"
"^bounce"
"^www(-data)?#"
"^mdaemon#"
"^root#"
"^news(letter)?#"
"^webmaster#" (role address - may not be a good indicator in your case)
"^administrator#" (role address - may not be a good indicator in your case)
"^support#" (role address - may not be a good indicator in your case)
Headers that indicate automated messages if they exist:
list-help
list-unsubscribe
list-subscribe
list-owner
list-post
list-archive
list-id
mailing-List
x-facebook-notify
x-mailing-list
x-cron-env
x-autoresponse
x-eBay-mailtracker
Headers that indicate automated messages if they have a special value:
'x-spam-flag':'yes'
'x-spam-status':'yes'
'X-Spam-Flag2': 'yes'
'precedence':'(bulk|list|junk)'
'x-precedence':'(bulk|list|junk)'
'x-barracuda-spam-status':'yes'
'x-dspam-result':'(spam|bl[ao]cklisted)'
'X-Mailer':'^Mail$'
'auto-submitted':'auto-replied'

Related

Mailgun API: Batch Sending vs. Individual Calls

Background
We're building an application that will process & send emails via Mailgun. These are sometimes one-off messages, initiated by a transaction. Some emails, though, will be sent to 30k+ at once.
Eg, a newsletter to all members.
Considerations
Mailgun offers a Batch Sending option with their API. Using "Recipient Variables", you can include dynamic values that are paired with a particular user.
This Batch Sending functionality is limited, however. You cannot send more than 1,000 recipients per request, which means we have to iterate through a recipient list (on our database) for each set of 1,000. Mailgun provides an example of how this might work, using Python (scroll about 2/3 down).
Question
Are there any advantages to batch sending (ie, sending an email to a group of recipients through a single API call, using recipient variables) as opposed to making our own loop, variable substitutions and individual API calls?
I assume this is more taxing on our server, as it would be processing each message itself, instead of just offloading all that data to Mailgun's server for heavy-lifting on their end. But I also like the flexibility & simplicity of handling that on our end and sending a "fully-rendered" message to Mailgun, one at a time, without having to iterate 1k at a time.
Any thoughts on best practices, or considerations we should take into account?
Stumbled onto this today, and felt it provided a pretty good summary/answer for my original question. I wanted to post this as an answer, in case anybody else has this question and hasn't found this Mailgun post. Straight from the horse's mouth, too. The nutshell version:
For PHP, at least, the SDK has a Mailgun class, with a BatchMessage() method. This actually handles the counting of recipients for you, so you can just queue up as many email addresses as you want (ie, more than 1k) and Mailgun will fire off to the API endpoint as needed. Pretty slick!
Here's their original wording, plus a link to the page.
Sending a message with Mailgun PHP SDK + Batch Message:
Batch Message
In addition to Message Builder, we have Batch Message. This class
allows you to build a message and submit a template message in
batches, up to 1,000 recipients per post. The benefit of using this
class is that the recipients tally is monitored and will automatically
submit the message to the endpoint when you've added the 1,000th
recipient. This means you can build your message and begin iterating
through your database. Forget about sending the message, the SDK will
keep track of posting to the API when necessary.
// First, instantiate the SDK with your API credentials and define your domain.
$mgClient = new Mailgun("key-example");
$domain = "example.com";
// Next, instantiate a Message Builder object from the SDK, pass in your sending domain.
$batchMsg = $mgClient->BatchMessage($domain);
// Define the from address.
$batchMsg->setFromAddress("dwight#example.com",
array("first"=>"Dwight", "last" => "Schrute"));
// Define the subject.
$batchMsg->setSubject("Help!");
// Define the body of the message.
$batchMsg->setTextBody("The printer is on fire!");
// Next, let's add a few recipients to the batch job.
$batchMsg->addToRecipient("pam#example.com",
array("first" => "pam", "last" => "Beesly"));
$batchMsg->addToRecipient("jim#example.com",
array("first" => "Jim", "last" => "Halpert"));
$batchMsg->addToRecipient("andy#example.com",
array("first" => "Andy", "last" => "Bernard"));
// ...etc...etc...
// After 1,000 recipeints,
// Batch Message will automatically post your message to the messages endpoint.
// Call finalize() to send any remaining recipients still in the buffer.
$batchMsg->finalize();
The answer of #cdwyer and #nikoshr is very helpful, but bit legacy. Used methods in the example are deprecated. Here is current usage of lib:
$batchMessage = $this->mailgun->messages()->getBatchMessage('mydomain.com');
$batchMessage->setFromAddress('user#domain.com');
$batchMessage->setReplyToAddress('user2#domain.com');
$batchMessage->setSubject('Contact form | Company');
$batchMessage->setHtmlBody('<html>...</html>');
foreach ($recipients as $recipient) {
$batchMessage->addToRecipient($recipient);
}
$batchMessage->finalize();
More info at documentation.

phpmailer error codes for outcome processing

I am building a mailout capability and it is working OK as far as it goes. However, I want to distinguish between various potential (high level) outcomes in order to determine what happens to each message after the current send attempt.
This must be a common requirement so I seem to be missing something pretty obvious, but I can't find anything that addresses it, either here or via Google or on PHPMailer site or .. . Possibly because there are so many questions about specific errors that I just can't find anything useful in all the other results.
At very high level:
Attempt send, and assess resulting error/result. Identify whether this message has been sent, must be retried later, or failed permanently.
- success -> update message status as 'SENT: OK'
- sent, but some issues (e.g. one recipient failed, others processed OK)-> 'SENT: some error'
- failed, due to temporary problem (e.g. connection problem, attachment open) -> 'TRY LATER'
- failed, due to message-specific problem that we should NOT try to resend-> 'FAILED: some error'
As I was unable to find an existing resource with e.g. a table of errors, I spent some time working through the phpmailerException code to try to build one myself, but it's not simple because a) they don't appear to have been designed in terms of this kind of grouping logic, b) it is not easy to uniquely identify a particular error: PHPMailer provides human-friendly messages, which are different in different languages, rather than an identifiable code - given that my solution will need to work across different language installations that's a problem!
Obviously SMTP itself provides a range of errorcodes which I could potentially use for this purpose, but how do I access these via PHPMailer? (This would work for me as I only use SMTP at this point - however, this would NOT work if other message transport like sendmail was used, so I would prefer a PHPMailer solution)
If you want individual result codes for individual address, you really need to send each message separately. If you do get errors on some recipients, they will be listed in the ErrorInfo property - look in the smtpSend function to see how the error string is assembled. I agree that it's not especially easy to parse that info out. The error messages in PHPMailer are generally more for the developer than the end user, so the translations are not that significant. You can get slightly more information about errors if you enable exceptions rather than relying only on return values.

Persistence of custom headers within an email thread

I this is probably a strange question, but I thought I'd go ahead and ask. Say, I send an email, using IMAP SMTP, through a special client. This client adds a few custom headers to the email message before sending it on its way. The recipient receives this email and responds to me directly (and maybe CC's a few people as well).
My question is this: Given the above example, would these X-headers persist throughout all the new messages within the thread?
One thing I can think of is the client would be aware of the original email message it sent. All subsequent responses to this email would have a "Reply-To" header whose value equals the "Message-Id" of the previous email. I don't see why I couldn't crawl up these thread of replies until I get to the original message sent by the client, thereby deriving the original custom headers.
Maybe I'm over-thinking this. Any suggestions? :)
A message reply does not necessarily contain anything of the original message. The MUA is likely to suggest a modified (e.g. prepended with "Re:") version of the original subject, and obviously the addresses are utilised for appropriate defaults as well. None of the other content of the message forms part of the reply (unless the sender deliberately includes it, as with quoting or forwarding). Any X- headers that you have in your message will certainly not be included in the reply (unless you have control over that MUA).
However, your plan of tracking the original message is certainly feasible: see Section 3.6.4 of RFC 5322. Every message should (not must) have a Message-ID header, and should have In-Reply-To and References headers when appropriate.
The "Message-ID:" field contains a single unique message identifier. The "References:" and "In-Reply-To:" fields each contain one or more unique message identifiers, optionally separated by [whitespace].
In-Reply-To is mention to identify the message (or messages) that is (are) being replied to, while References identifies the entire thread of conversation. The References header is meant to contain the entire contents of the References header of the message being replied to, so you only need the last message to identify the entire thread.
Note that In-Reply-To and Reply-To are not the same thing (the latter specifies the address that the sender wishes replies to be sent to).
Assuming that you have the original message, then you should be able to use the References header of any reply to identify the original message. Not every MUA will handle References or In-Reply-To correctly, but most will.
As far as I know, there's no reason to think any email client would propagate any header lines it doesn't understand. Most will preserve the subject (usually adding "Re: " if necessary) and derive their "To: " and "Cc: " lines from the previous message's headers, but that's about it. I suppose some (but not all) will generate an "In-Reply-To" line, but that's as far as it goes.
Your idea of having a client crawl back through the thread looking for specific headers sounds like it might be do-able, but you'd have to write your own email client if you want that feature, and you'd still be blocked by the fact that not all email clients preserve message threading in any way.

What heuristics should I use to prevent an autoresponder war?

I am currently extending an e-mail system with an autoresponse feature. In a dark past, I've seen some awesome mail loops, and I'm now trying to avoid such a thing from happening to me.
I've looked at how other tools ('mailbot', 'vacation') are doing this, grepped my own mail archive for suspicious mail headers, but I wonder if there is something else I can add.
My process at this point:
Refuse if sender address is invalid (this should get rid of messages with <> sender)
Refuse if sender address matches one of the following:
'^root#',
'^hostmaster#',
'^postmaster#',
'^nobody#',
'^www#',
'-request#'
Refuse if one of these headers (after whitespace normalization and lowercasing) is present:
'^precedence: junk$',
'^precedence: bulk$',
'^precedence: list$',
'^list-id:',
'^content-type: multipart/report$',
'^x-autogenerated: reply$',
'^auto-submit: yes$',
'^subject: auto-response$'
Refuse if sender address was already seen by the autoresponder in the recent past.
Refuse if the sender address is my own address :)
Accept and send autoresponse, prepending Auto-response: to the subject, setting headers Precedence: bulk and Auto-Submit: yes to hopefully prevent some remote mailer from propagating the autoresponse any further.
Is there anything I'm missing?
In my research so far I've come up with these rules.
Treat inbound message as autogenerated, ignore it and blacklist the sender if...
Return-Path header is <> or missing/invalid
Auto-Submitted header is present with any value other than "no"
X-Auto-Response-Suppress header is present
In-Reply-To header is missing
Note: If I'm reading RFC3834 correctly, your own programs SHOULD set this, but so far it seems some autoresponders omit this (freshdesk.com)
When sending outbound messages, be sure to...
Set the Auto-Submitted: auto-generated header (or auto-replied as appropriate)
Set your SMTP MAIL FROM: command with the null address <>
Note some delivery services including Amazon SES will set their own value here, so this may not be feasible
Check the recipient against the blacklist built up by the inbound side and abort sending to known autoresponders
Consider sending not more than 1 message per unit time (long like 24 hours) to a given recipient
Notes on other answers and points
I think ignoring Precedence: list messages will cause false positives, at least for my app's configuration
I believe the OP's "auto-submit" rule is a typo and the official header is Auto-Submitted
References
RFC3834
This SO question about Precedence header has several good answers
Wikipedia Email Loop Article
desk.com article
Comments welcome and I'll update this answer as this is a good question and I'd like to see an authoritative answer created.
Update 2014-05-22
To find if an inbound message is an "out-of-office" or other automatic reply, we use that procedure:
First, Find if header "In-Reply-To" is present. If not, that is an auto-reply.
Else, check if 1 of these header is present:
X-Auto-Response-Suppress (any value)
Precedence (value contains bulk, or junk or list)
X-Webmin-Autoreply (value 1)
X-Autogenerated (value Reply)
X-AutoReply (value YES)
Include a phrase like "This is an automatically-generated response" in the body somewhere. If your message body is HTML (not plain text) you can use a style to make it not visible.
Check for this phrase before responding. If it exists, odds are good it's an automated response.

How to make an email bot that replies to users not reply to auto-responses and get itself into mail loops

I have a bot that replies to users. But sometimes when my bot sends its reply, the user or their email provider will auto-respond (vacation message, bounce message, error from mailer-daemon, etc). That is then a new message from the user (so my bot thinks) that it in turn replies to. Mail loop!
I'd like my bot to only reply to real emails from real humans. I'm currently filtering out email that admits to being bulk precedence or from a mailing list or has the Auto-Submitted header equal to "auto-replied" or "auto-generated" (see code below). But I imagine there's a more comprehensive or standard way to deal with this. (I'm happy to see solutions in other languages besides Perl.)
NB: Remember to have your own bot declare that it is autoresponding! Include
Auto-Submitted: auto-reply
in the header of your bot's email.
My original code for avoiding mail loops follows. Only reply if realmail returns true.
sub realmail {
my($email) = #_;
$email =~ /\nSubject\:\s*([^\n]*)\n/s;
my $subject = $1;
$email =~ /\nPrecedence\:\s*([^\n]*)\n/s;
my $precedence = $1;
$email =~ /\nAuto-Submitted\:\s*([^\n]*)\n/s;
my $autosub = $1;
return !($precedence =~ /bulk|list|junk/i ||
$autosub =~ /(auto\-replied|auto\-generated)/i ||
$subject =~ /^undelivered mail returned to sender$/i
);
}
(The Subject check is surely unnecessary; I just added these checks one at a time as problems arose and the above now seems to work so I don't want to touch it unless there's something definitively better.)
RFC 3834 provides some guidance for what you should do, but here are some concrete guidelines:
Set your envelope sender to a different email address than your auto-responder so bounces don't feed back into the system.
I always store in a database a key of when an email response was sent from a specific address to another address. Under no circumstance will I ever respond to the same address more than once in a 10 minute period. This alone stopped all loops, but doesn't ensure nice behavior (auto-responses to mailing lists are annoying).
Make sure you add any permutation of header that other people are matching on to stop loops. Here's the list I use:
X-Loop: autoresponder
Auto-Submitted: auto-replied
Precedence: bulk (autoreply)
Here are some header regex's I use to avoid loops and to try to play nice:
/^precedence:\s+(?:bulk|list|junk)/i
/^X-(?:Loop|Mailing-List|BeenThere|Mailman)/i
/^List-/i
/^Auto-Submitted:/i
/^Resent-/i
I also avoid responding if any of these are the envelop senders:
if ($sender eq ""
|| $sender =~ /^(?:request|owner|admin|bounce|bounces)-|-(?:request|owner|admin|bounce|bounces)\#|^(?:mailer-daemon|postmaster|daemon|majordomo|ma
ilman|bounce)\#|(?:listserv|listsrv)/i) {
That really sounds like something that's probably available as a module from CPAN, but I didn't find anything clearly relevant in five minutes of searching. Mail::Lite::Mbox::Processor looks like it might do what you want:
Mail::Lite::Message::Matcher is a
framework for automated mail
processing. For example you have a
mail server and you have a need to
process some types of incoming mail
messages automatically. For example,
you can extract automated
notifications, invoices, alerts etc.
from your mail flow and perform some
tasks based on content of those
messages.
but its docs are sparse enough that it isn't immediately obvious whether it provides those example functions itself or if you have to provide the code to drive them.
In any case, though, if you haven't already checked CPAN, that's where I would start if I wanted to do something like this.
My answer here only deals with bounces which is more straightforward.
Using DSN (Delivery Status Notification) identifier will help you detect a DSN/bounced message. It should go to Return-Path and not Reply-To.
Here's a sample of a typical DSN message. The header information includes the message id, content type has specific values (delivery-status) etc.
Not able to provide you any codes in perl, just my 2 cents of idea.
PS: Do note that not all mail servers or MTA conforms to this, but I guess most do.
There should be a standard way of dealing with this, but the problem is that you'd have to assume that systems that send auto-replies comply to that standard, when most the time, they just don't.
How do you get the address that you reply to? I hope you aren't using the From: header. Check the Reply-to: header first and if that doesn't exist, use the Return-path:.
But whatever you do, you will simply have to keep a log of what you sent to whom and throttle your bot to some sensible value of messages per time.