Can inline images be referenced from a different message in a thread? - email

My research through the RFC says that you can reference inline content from other mail parts using the cid: token. I also know that you can use the mid: token in a similar for message-ID. When referencing a message-ID, you can reference mail parts of another message by doing mid:messageId/contentId, contentId being a valid contentId in the target message.
I'm leaning towards no, inline images (or other inline content) can't be referenced and displayed in entirely different messages. But if that's true, I can't piece together what the purpose of using mid: is.
A simple visualization of what I'm imagining is this:
Given a multipart message with an html body and inline image... our cid reference would look like:
<img src="cid:abcd-i-am-a-content-id">
This assumes we do in fact have a multipart/related with a mail part that has some valid image payload with a matching content-ID.
What if we were replying to this original message, can I do something like:
<img src="mid:original-message-id/abcd-i-am-a-content-id"> to inline this resource that would presumably be accessible by the client's mail store that belongs to the recipient assuming all other normal threading rules are followed?

No, there's no way to do that and expect it to work.
Even "mid:" won't work in most clients.

Related

How can I distinguish between attachment types in exchangelib?

I've just noticed that Microsoft OWA does not display some attachments. Some people use images in their footer (which are attachments). I'm not sure if the only difference between a "normal" attachment and this emedded attachment is that it is embedded in the email.
Is there another difference? How can I get only attachments which OWA* displays as attachements?
* and probably most other email clients; I think I've seen a similar behavior in Google Mail
Those attachments have a content_id. They are referenced within the mail.body as cid:[CONTENT-ID]. The content_id looks like this:
cid:image001.jpg#01D3151A.F9036A80
where image001.jpg is the filename.
looking for cid:image_name inside mail body fails for embedded images with src referring to a link rather than cid.
so the best solution would be to use attachments.is_inline property which is built in exchangelib.
for attachment in msg.attachments:
if msg.has_attachments == True:
if isinstance(attachment, FileAttachment):
if attachment.is_inline:
print("Embeded Image")
else:
print("Normal Attachment")
reference: https://github.com/ecederstrand/exchangelib/issues/562

Persistence of custom headers within an email thread

I this is probably a strange question, but I thought I'd go ahead and ask. Say, I send an email, using IMAP SMTP, through a special client. This client adds a few custom headers to the email message before sending it on its way. The recipient receives this email and responds to me directly (and maybe CC's a few people as well).
My question is this: Given the above example, would these X-headers persist throughout all the new messages within the thread?
One thing I can think of is the client would be aware of the original email message it sent. All subsequent responses to this email would have a "Reply-To" header whose value equals the "Message-Id" of the previous email. I don't see why I couldn't crawl up these thread of replies until I get to the original message sent by the client, thereby deriving the original custom headers.
Maybe I'm over-thinking this. Any suggestions? :)
A message reply does not necessarily contain anything of the original message. The MUA is likely to suggest a modified (e.g. prepended with "Re:") version of the original subject, and obviously the addresses are utilised for appropriate defaults as well. None of the other content of the message forms part of the reply (unless the sender deliberately includes it, as with quoting or forwarding). Any X- headers that you have in your message will certainly not be included in the reply (unless you have control over that MUA).
However, your plan of tracking the original message is certainly feasible: see Section 3.6.4 of RFC 5322. Every message should (not must) have a Message-ID header, and should have In-Reply-To and References headers when appropriate.
The "Message-ID:" field contains a single unique message identifier. The "References:" and "In-Reply-To:" fields each contain one or more unique message identifiers, optionally separated by [whitespace].
In-Reply-To is mention to identify the message (or messages) that is (are) being replied to, while References identifies the entire thread of conversation. The References header is meant to contain the entire contents of the References header of the message being replied to, so you only need the last message to identify the entire thread.
Note that In-Reply-To and Reply-To are not the same thing (the latter specifies the address that the sender wishes replies to be sent to).
Assuming that you have the original message, then you should be able to use the References header of any reply to identify the original message. Not every MUA will handle References or In-Reply-To correctly, but most will.
As far as I know, there's no reason to think any email client would propagate any header lines it doesn't understand. Most will preserve the subject (usually adding "Re: " if necessary) and derive their "To: " and "Cc: " lines from the previous message's headers, but that's about it. I suppose some (but not all) will generate an "In-Reply-To" line, but that's as far as it goes.
Your idea of having a client crawl back through the thread looking for specific headers sounds like it might be do-able, but you'd have to write your own email client if you want that feature, and you'd still be blocked by the fact that not all email clients preserve message threading in any way.

Parse and display MIME multipart email on website

I have a raw email, (MIME multipart), and I want to display this on a website (e.g. in an iframe, with tabs for the HTML part and the plain text part, etc.). Are there any CPAN modules or Template::Toolkit plugins that I can use to help me achieve this?
At the moment, it's looking like I'll have to parse the message with Email::MIME, then iterate over all the parts, and write a handler for all the different mime types.
It's a long shot, but I'm wondering if anyone has done all this already? It's going to be a long and error prone process writing handlers if I attempt it myself.
Thanks for any help.
I actually just dealt with this problem just a few months ago. I added an email feature to the product I work for, both sending and receiving. The first part was sending reminders to users, but we didn't want to manage the bounce backs for our customer admins, we decided to have a message inbox that the admins could see bounces and replies without us, and the admins can deal with adjusting email addresses if they needed to.
Because of this, we accept all email that is sent to an inbox we watch. We use VERP to associate an email with a user, and store the entire email as is in the database. Then, when the admin requests to see the email, we have to parse the email.
My first attempt was very similar to an earlier answer. If one of the parts is html, show it. If it's text, show it. Otherwise, show the original, raw email. This broke down real fast with a few emails not generated by sendmail. Outlook, Exchange, and a few other email systems don't do that, they use multiparts to send the email. After a lot of digging and cussing, I discovered that the problem doesn't appear to be well documented. With the help of looking through MHonArc and reading the RFC's (RFC2045 and RFC2046), I settled on the solution below. I decided on not using MHonArc, since I couldn't easily resuse the parsing and display functionality. I wouldn't say this is perfect, but it's been good enough that we used it.
First, take the message and use Email::MIME to parse it. Then call a function called get_part with the array of parts Email::MIME gives you with ->parts().
get_part, for each part it was passed, decodes the content type, looks it up in a hash, and if it exists, call the function associated with that content type. If the decoder was able to give us something, put it on a result array.
The last piece of the puzzle is this decoder array. Basically, it defines the content types I can deal with:
text/html
text/plain
message/delivery-status, which is actually also plain text
multipart/mixed
multipart/related
multipart/alternative
The non-multipart sections I return as is. With mixed, related and alternative, I merely call get_parts on that MIME node and returns the results. Because alternative is special, it has some extra code after calling get_parts. It will only return html if it has an html part, or it will return only the text part of it has a text part. If it has neither, it won't return anything valid.
The advantage with the hash of valid content types is that I can easily add logic for more parts as needed. And by the time you get_parts is done, you should have an array of all content you care about.
One more item I should mention. As a part of this, we created a separate domain that actually serves these messages. The main domain that an admin works on will refuse to serve the message and redirect the browser to our user content domain. This second domain will only serve user content. This is to help the browser properly sandbox the content away from our main domain. See same origin policy (http://en.wikipedia.org/wiki/Same_origin_policy)
It doesn't sound like a difficult job to me:
use Email::MIME;
my $parsed = Email::MIME->new($message);
my #parts = $parsed->parts; # These will be Email::MIME objects, too.
print <<EOF;
<html><head><title>!</title></head><body>
EOF
for my $part (#parts) {
my $content_type = $parsed->content_type;
if ($content_type eq "text/plain") {
print "<pre>", $part->body (), "</pre>\n";
}
elsif ($content_type eq "text/html") {
print $part->body ();
}
# Handle some more cases here
}
print <<EOF;
</body></html>
EOF
Reuse existing complete software. The MHonArc mail-to-HTML converter has excellent MIME support.

How does the email header field 'thread-index' work?

I was wondering if anyone knew how the thread-index field in email headers work?
Here's a simple chain of emails thread indexes that I messaged myself with.
Email 1 Thread-Index: AcqvbpKt7QRrdlwaRBKmERImIT9IDg==
Email 2 Thread-Index: AcqvbpjOf+21hsPgR4qZeVu9O988Eg==
Email 3 Thread-Index: Acqvbp3C811djHLbQ9eTGDmyBL925w==
Email 4 Thread-Index: AcqvbqMuifoc5OztR7ei1BLNqFSVvw==
Email 5 Thread-Index: AcqvbqfdWWuz4UwLS7arQJX7/XeUvg==
I can't seem to say with certainty how I can link these emails together. Normally, I would use the in-reply-to field or references field, but I recently found that Blackberrys do NOT include these fields. The only include Thread-Index field.
They are base64 encoded Conversation Index values. No need to reverse engineer them as they are documented by Microsoft on e.g. http://msdn.microsoft.com/en-us/library/ms528174(v=exchg.10).aspx and more detailed on http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx
Seemingly the indexes in your example doesn't represent the same conversation, which probably means that the software that sent the mails wasn't able to link them together.
EDIT: Unfortunately I don't have enough reputation to add a comment, but adamo is right that it contains a timestamp - a somewhat esoteric encoded partial FILETIME. But it also contains a GUID, so it is pretty much guarenteed to be unique for that mail (of course the same mail can exist in multiple copies).
There's a good analysis of how exactly this non-standard "Thread-Index" header appears to be used, in this post and links therefrom, including this pdf (a paper presented at the CEAS 2006 conference) and this follow-up, which includes a comment on the issue from the evolution source code (which seems to reflect substantial reverse-engineering of this undocumented header).
Executive summary: essentially, the author eventually gives up on using this header and recommends and shows a different approach, which is also implemented in the c-client library, part of the UW IMAP Toolkit open source package (which is not for IMAP only -- don't let the name fool you, it also works for POP, NNTP, local mailboxes, &c).
I wouldn't be surprised if there are mail clients out there which would not be able to link Blackberry's mails to their threads. The Thread-Index header appears to be a Microsoft extension.
Either way, Novell Evolution implements this. Take a look at this short description of how they do it, or this piece of code that finds the thread parent of a given message.
I assume that, because the lengths of the Thread-Index headers in your example are all the same, these messages were all thread starts? Strange that they're only 22-bytes, though I suppose you could try applying the 5-bytes-per-message rule to them and see if it works for you.
If you are interested in parsing the Thread-Index in C# please take a look at this post
http://forum.rebex.net/questions/3841/how-to-interprete-thread-index-header
The snippet you will find there will let you parse the Thread-Index and retrieve the Thread GUID and message DateTime. There is a problem however, it does not work for all Thread-Indexes out there. Question is why do some Thread-Indexes generate invalid DateTime and what to do to support all of them???

parsing email message

Just want a basic understand of what parts a email message may have.
I know there is a messageId, date, subject, from, cc, bcc, body, etc.
Specifically I want to know how attachments and images may be embedded in the email.
At this point I think there are 2, please correct me if I am wrong.
attachments
embedded attachments/images
is that correct?
The official answer for this question is contained in RFC5322 and some related RFC's. The Wikipedia entry for email does a pretty good job of referencing the RFC numbers. To get started with MIME see RFC2045.
Attachments are encoded as multipart similar to multipart file uploads. Basically the message has a header saying there is an attachment and sets a boundary ( random string of characters to announce the start of the attachment) The boundary says when the data of the attachment starts. I think the filename is set on the boundary as well (if i remember correctly). I am doing a bit of hand waving, but this is the basic idea.
so you get somthing like
To: ...
From: ...
Content-Type: Multpart...
Content-Boundry: ewafoiuasfjasdfoashiafhj
message here
--------- Content-boundry: ewafoiuasfjasdfoashiafhj
attachement here