Parse and display MIME multipart email on website - perl

I have a raw email, (MIME multipart), and I want to display this on a website (e.g. in an iframe, with tabs for the HTML part and the plain text part, etc.). Are there any CPAN modules or Template::Toolkit plugins that I can use to help me achieve this?
At the moment, it's looking like I'll have to parse the message with Email::MIME, then iterate over all the parts, and write a handler for all the different mime types.
It's a long shot, but I'm wondering if anyone has done all this already? It's going to be a long and error prone process writing handlers if I attempt it myself.
Thanks for any help.

I actually just dealt with this problem just a few months ago. I added an email feature to the product I work for, both sending and receiving. The first part was sending reminders to users, but we didn't want to manage the bounce backs for our customer admins, we decided to have a message inbox that the admins could see bounces and replies without us, and the admins can deal with adjusting email addresses if they needed to.
Because of this, we accept all email that is sent to an inbox we watch. We use VERP to associate an email with a user, and store the entire email as is in the database. Then, when the admin requests to see the email, we have to parse the email.
My first attempt was very similar to an earlier answer. If one of the parts is html, show it. If it's text, show it. Otherwise, show the original, raw email. This broke down real fast with a few emails not generated by sendmail. Outlook, Exchange, and a few other email systems don't do that, they use multiparts to send the email. After a lot of digging and cussing, I discovered that the problem doesn't appear to be well documented. With the help of looking through MHonArc and reading the RFC's (RFC2045 and RFC2046), I settled on the solution below. I decided on not using MHonArc, since I couldn't easily resuse the parsing and display functionality. I wouldn't say this is perfect, but it's been good enough that we used it.
First, take the message and use Email::MIME to parse it. Then call a function called get_part with the array of parts Email::MIME gives you with ->parts().
get_part, for each part it was passed, decodes the content type, looks it up in a hash, and if it exists, call the function associated with that content type. If the decoder was able to give us something, put it on a result array.
The last piece of the puzzle is this decoder array. Basically, it defines the content types I can deal with:
text/html
text/plain
message/delivery-status, which is actually also plain text
multipart/mixed
multipart/related
multipart/alternative
The non-multipart sections I return as is. With mixed, related and alternative, I merely call get_parts on that MIME node and returns the results. Because alternative is special, it has some extra code after calling get_parts. It will only return html if it has an html part, or it will return only the text part of it has a text part. If it has neither, it won't return anything valid.
The advantage with the hash of valid content types is that I can easily add logic for more parts as needed. And by the time you get_parts is done, you should have an array of all content you care about.
One more item I should mention. As a part of this, we created a separate domain that actually serves these messages. The main domain that an admin works on will refuse to serve the message and redirect the browser to our user content domain. This second domain will only serve user content. This is to help the browser properly sandbox the content away from our main domain. See same origin policy (http://en.wikipedia.org/wiki/Same_origin_policy)

It doesn't sound like a difficult job to me:
use Email::MIME;
my $parsed = Email::MIME->new($message);
my #parts = $parsed->parts; # These will be Email::MIME objects, too.
print <<EOF;
<html><head><title>!</title></head><body>
EOF
for my $part (#parts) {
my $content_type = $parsed->content_type;
if ($content_type eq "text/plain") {
print "<pre>", $part->body (), "</pre>\n";
}
elsif ($content_type eq "text/html") {
print $part->body ();
}
# Handle some more cases here
}
print <<EOF;
</body></html>
EOF

Reuse existing complete software. The MHonArc mail-to-HTML converter has excellent MIME support.

Related

Magento - How to format the email address in "Store Email Addresses"?

What I'm trying to do is format the email address like so Foo Bar <foo#bar.com> so email clients (including web based like gmail) will show the sender as "Foo Bar" and not just "foo" since that is the prefix on the email.
Is this possible in Magento 1.5? If I try to put the formatting in the field itself I get an error message.
That's what the Sender Name field does. This is what my setup looks like and what it looks like in Thunderbird (my webmail client formats it similarly, too):
You may look at code/core/Mage/Core/Model/Email.php for the actual mail implementation. You may notice it uses Zend_Mail. Now, here is the catch:
For security reasons, Zend_Mail filters all header fields to prevent header
injection with newline (\n) characters. Double quotation is changed to single
quotation and angle brackets to square brackets in the name of sender and recipients.
If the marks are in email address, the marks will be removed.
What you can do though is to make a module to rewrite the current class Mage_Core_Model_Email and overwrite the send() function. I covered rewriting classes before in this question's answer. Then, in this new rewritten class, you could PHPmailer, invoke shell mail commands, or whatever PHP library you would happen to know that would allow this behaviour.
Best of luck.
I'm not sure if it can work. In magento inside is a Zend_Validate which does validation of such email addresses. And I dont think so that email like 'Foo Bar ' is valid. But I think the display in customer's email client depend on client, no?

Encoding a email address that can be used as part of a URL in codeigniter

Is there a way to encode a email address that can be used as a part of a url in codeigniter?. I need to decode back the email address from the url.
What I am trying to do is just a -forgotten password recovery- thing. I send a confirmation link to the user's email address, the link needs to be like ../encodedEmail/forgottenPasswordCode (with the forgottenPasswordCode updated in the db for the user with the submitted email).
When the user visits that link, I decode the email(if the email - forgottenPasswordCode pair is in the table), i allow them to reset their password (and i reset forgottenPasswordCode back to null).
I could just do a loop -checking the table with a select query- (or) -set that forgottenPasswordCode column unique, so i keep generating on a insert failure(would that be a lot faster ?)- until I generate a forgottenPasswordCode that doesn't already exist in the table.
But the guy I do this for would not accept it this way:). He wants the checking be done with the user's email, he thinks its much faster.
I am working with codeigniter, I used its encode() function, it seems to produce characters like '-slashes-' at times that breaks the encoded-email-string.
Any other ideas?
try using bin2hex() and hex2bin() function,
<?php
function hex2bin($str)
{
$bin = "";
$i = 0;
do
{
$bin .= chr(hexdec($str{$i}.$str{($i + 1)}));
$i += 2;
} while ($i < strlen($str));
return $bin;
}
$str = 'email#website.com';
$output = bin2hex($str);
echo $output . '<br/>';
echo hex2bin($output);
?>
Don't put data in the URL that doesn't have some sort of meaning. This leaves two choices:
Send the address as part of a POST. If it's coming from a web form this is the way to go.
Refer to the address in the database using an ID or hashed value. If you need the user to click a link referring to their account, use something that clearly refers to their account. If you need to refer to an instance of a password reset (many systems do this), add a table containing hashes, using that hash in the URL.
Why not just encode it in the URL?
You can see URLs (it's part of the UI), encoded things look weird
URLs represent resources, things in your app (users probably already have IDs)
Encoded email addresses are long (making these URLs harder to work with in things like emails)
Try to keep parameters in URLs to clear references to concepts in your web app (point at one user by ID or plaintext name, for example). Parameters that don't fit in URLs go in POST parameters. If you must use something encoded in a URL, prefer one-way-encoding and database lookups.
Although it may be not optimal design solution to use email as a part URL,
use email as base64 encoded string to avoid any issues with special chars
E.g. Base64 encoded string 'abc-def#example.com' is
YWJjLWRlZkBleGFtcGxlLmNvbQ==
In your case the URL is
../YWJjLWRlZkBleGFtcGxlLmNvbQ==/forgottenPasswordCode
All you need is to decode that string back before usage

What's a good Perl OO interface for creating and sending email?

I'm looking for a simple (OO?) approach to email creation and sending.
Something like
$e = Email->new(to => "test <test#test.com>", from => "from <from#from.com>");
$e->plain_text($plain_version);
$e->html($html_version);
$e->attach_file($some_file_object);
I've found Email::MIME::CreateHTML, which looks great in almost every way, except that it does not seem to support file attachments.
Also, I'm considering writing these emails to a database and having a cronjob send them at a later date. This means that I would need a $e->as_text() sub to return the entire email, including attachments, as raw text which I could stuff into the db. And so I would then need a way of sending the raw emails - what would be a good way of achieving this?
Many thanks
You have to read the documentation more carefully, then two of your three questions would be moot.
From the synopsis of Email::MIME::CreateHTML:
my $email = Email::MIME->create_html(
You obviously get an Email::MIME object. See methods parts_set and parts_set for so called attachments.
Email::MIME is a subclass of Email::Simple. See method as_string for serialising the object to text.
See Email::Sender for sending mail.
You might check out perl MIME::Lite.
You can get the message as a string to save into a database:
### Get entire message as a string:
$str = $msg->as_string;
Email::Stuff is a nice wrapper for Email::MIME. You don't need to care about the MIME structure of the mail, the module does it for you.
Email::Stuff->from ('cpan#ali.as' )
->to ('santa#northpole.org' )
->bcc ('bunbun#sluggy.com' )
->text_body($body )
->attach (io('dead_bunbun_faked.gif')->all,
filename => 'dead_bunbun_proof.gif')
->send;
It also has as_string.

How does the email header field 'thread-index' work?

I was wondering if anyone knew how the thread-index field in email headers work?
Here's a simple chain of emails thread indexes that I messaged myself with.
Email 1 Thread-Index: AcqvbpKt7QRrdlwaRBKmERImIT9IDg==
Email 2 Thread-Index: AcqvbpjOf+21hsPgR4qZeVu9O988Eg==
Email 3 Thread-Index: Acqvbp3C811djHLbQ9eTGDmyBL925w==
Email 4 Thread-Index: AcqvbqMuifoc5OztR7ei1BLNqFSVvw==
Email 5 Thread-Index: AcqvbqfdWWuz4UwLS7arQJX7/XeUvg==
I can't seem to say with certainty how I can link these emails together. Normally, I would use the in-reply-to field or references field, but I recently found that Blackberrys do NOT include these fields. The only include Thread-Index field.
They are base64 encoded Conversation Index values. No need to reverse engineer them as they are documented by Microsoft on e.g. http://msdn.microsoft.com/en-us/library/ms528174(v=exchg.10).aspx and more detailed on http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx
Seemingly the indexes in your example doesn't represent the same conversation, which probably means that the software that sent the mails wasn't able to link them together.
EDIT: Unfortunately I don't have enough reputation to add a comment, but adamo is right that it contains a timestamp - a somewhat esoteric encoded partial FILETIME. But it also contains a GUID, so it is pretty much guarenteed to be unique for that mail (of course the same mail can exist in multiple copies).
There's a good analysis of how exactly this non-standard "Thread-Index" header appears to be used, in this post and links therefrom, including this pdf (a paper presented at the CEAS 2006 conference) and this follow-up, which includes a comment on the issue from the evolution source code (which seems to reflect substantial reverse-engineering of this undocumented header).
Executive summary: essentially, the author eventually gives up on using this header and recommends and shows a different approach, which is also implemented in the c-client library, part of the UW IMAP Toolkit open source package (which is not for IMAP only -- don't let the name fool you, it also works for POP, NNTP, local mailboxes, &c).
I wouldn't be surprised if there are mail clients out there which would not be able to link Blackberry's mails to their threads. The Thread-Index header appears to be a Microsoft extension.
Either way, Novell Evolution implements this. Take a look at this short description of how they do it, or this piece of code that finds the thread parent of a given message.
I assume that, because the lengths of the Thread-Index headers in your example are all the same, these messages were all thread starts? Strange that they're only 22-bytes, though I suppose you could try applying the 5-bytes-per-message rule to them and see if it works for you.
If you are interested in parsing the Thread-Index in C# please take a look at this post
http://forum.rebex.net/questions/3841/how-to-interprete-thread-index-header
The snippet you will find there will let you parse the Thread-Index and retrieve the Thread GUID and message DateTime. There is a problem however, it does not work for all Thread-Indexes out there. Question is why do some Thread-Indexes generate invalid DateTime and what to do to support all of them???

How to make an email bot that replies to users not reply to auto-responses and get itself into mail loops

I have a bot that replies to users. But sometimes when my bot sends its reply, the user or their email provider will auto-respond (vacation message, bounce message, error from mailer-daemon, etc). That is then a new message from the user (so my bot thinks) that it in turn replies to. Mail loop!
I'd like my bot to only reply to real emails from real humans. I'm currently filtering out email that admits to being bulk precedence or from a mailing list or has the Auto-Submitted header equal to "auto-replied" or "auto-generated" (see code below). But I imagine there's a more comprehensive or standard way to deal with this. (I'm happy to see solutions in other languages besides Perl.)
NB: Remember to have your own bot declare that it is autoresponding! Include
Auto-Submitted: auto-reply
in the header of your bot's email.
My original code for avoiding mail loops follows. Only reply if realmail returns true.
sub realmail {
my($email) = #_;
$email =~ /\nSubject\:\s*([^\n]*)\n/s;
my $subject = $1;
$email =~ /\nPrecedence\:\s*([^\n]*)\n/s;
my $precedence = $1;
$email =~ /\nAuto-Submitted\:\s*([^\n]*)\n/s;
my $autosub = $1;
return !($precedence =~ /bulk|list|junk/i ||
$autosub =~ /(auto\-replied|auto\-generated)/i ||
$subject =~ /^undelivered mail returned to sender$/i
);
}
(The Subject check is surely unnecessary; I just added these checks one at a time as problems arose and the above now seems to work so I don't want to touch it unless there's something definitively better.)
RFC 3834 provides some guidance for what you should do, but here are some concrete guidelines:
Set your envelope sender to a different email address than your auto-responder so bounces don't feed back into the system.
I always store in a database a key of when an email response was sent from a specific address to another address. Under no circumstance will I ever respond to the same address more than once in a 10 minute period. This alone stopped all loops, but doesn't ensure nice behavior (auto-responses to mailing lists are annoying).
Make sure you add any permutation of header that other people are matching on to stop loops. Here's the list I use:
X-Loop: autoresponder
Auto-Submitted: auto-replied
Precedence: bulk (autoreply)
Here are some header regex's I use to avoid loops and to try to play nice:
/^precedence:\s+(?:bulk|list|junk)/i
/^X-(?:Loop|Mailing-List|BeenThere|Mailman)/i
/^List-/i
/^Auto-Submitted:/i
/^Resent-/i
I also avoid responding if any of these are the envelop senders:
if ($sender eq ""
|| $sender =~ /^(?:request|owner|admin|bounce|bounces)-|-(?:request|owner|admin|bounce|bounces)\#|^(?:mailer-daemon|postmaster|daemon|majordomo|ma
ilman|bounce)\#|(?:listserv|listsrv)/i) {
That really sounds like something that's probably available as a module from CPAN, but I didn't find anything clearly relevant in five minutes of searching. Mail::Lite::Mbox::Processor looks like it might do what you want:
Mail::Lite::Message::Matcher is a
framework for automated mail
processing. For example you have a
mail server and you have a need to
process some types of incoming mail
messages automatically. For example,
you can extract automated
notifications, invoices, alerts etc.
from your mail flow and perform some
tasks based on content of those
messages.
but its docs are sparse enough that it isn't immediately obvious whether it provides those example functions itself or if you have to provide the code to drive them.
In any case, though, if you haven't already checked CPAN, that's where I would start if I wanted to do something like this.
My answer here only deals with bounces which is more straightforward.
Using DSN (Delivery Status Notification) identifier will help you detect a DSN/bounced message. It should go to Return-Path and not Reply-To.
Here's a sample of a typical DSN message. The header information includes the message id, content type has specific values (delivery-status) etc.
Not able to provide you any codes in perl, just my 2 cents of idea.
PS: Do note that not all mail servers or MTA conforms to this, but I guess most do.
There should be a standard way of dealing with this, but the problem is that you'd have to assume that systems that send auto-replies comply to that standard, when most the time, they just don't.
How do you get the address that you reply to? I hope you aren't using the From: header. Check the Reply-to: header first and if that doesn't exist, use the Return-path:.
But whatever you do, you will simply have to keep a log of what you sent to whom and throttle your bot to some sensible value of messages per time.