How do I extract the SMTP envelope and header with Perl? - perl

What is SMTP Envelope and SMTP header and what is the relationship between those? How do I extract them with Perl?

An SMTP message contains a set of headers such as From, To, CC, Subject and a whole range of other stuff.
An SMTP Envelope is simply the name given to a small set of header prefixed to the standard SMTP message when the message is moved about by the Message Transport Agent (ie. the SMTP server). The most common envelope headers are X-Sender, X-Receiver and Received.
For example Microsofts SMTP Server will add the X-Sender and a series of X-Receiver headers to the top of a message when it drops the message into its Drop folder. There will be one X-Receiver for each post box that matches the domain the Drop folder is for.
Another example is SMTP servers add a Receive: header when it receives a message from another SMTP server. This header gives various details of the exchange. Hence most emails on the tinternet once arrived at the final destination will have a series of Receive headers indicating the SMTP server hops the message took to arrive. Usually servers remove the X-Sender, X-Receiver headers when the message is finally moved to a POP3 mailbox.
Accessing Headers
On the windows platform the only way I've found to access the envelope headers is to simply open and parse the eml file. Its a pretty simple format (name: value CR LF).
Again on the windows platform the main set of message headers and body parts can be accessed using the CDOSYS.dll COM based set of objects. How you would do this on other platforms I don't know. However the header format is quite straight forward as per the envelope headers, its accessing the body parts that would require more creative coding.

The envelope is the addressing information sent to the server during the initial conversation via the "MAIL FROM:" and "RCPT TO:" commands.
The SMTP header is the collection of header lines which are sent after the DATA command is issued.
How you find them is dependant on how/where you're getting the message from, and we'd need a lot more clues to attempt to answer that.

You can actually think of three different things here. There are the directives that were exchanged between the SMTP MTAs (during each hop the message took) ... the headers that were generated by the MUA and headers that were added (or modified) by MTAs along the route that a given message traversed.
The "envelope" refers to the information provided to the MTA (normally the most recent or final destination MTA). The sender includes a set of headers after the DATA directive in the SMTP connection (separated from the body of the message by a blank line ... but double check the RFC if that's specifically supposed to be a CR/LF pair). Note that the local MTA may add additonal headers and might even modify some headers before storing or forwarding the message.
(Normally it should only add Received-by: headers).
Some MTAs are configured to add X-Envelope-To: and/or X-Envelope-From: headers. Some of them will still filter the contents of these headers (for example to prevent leakage of blind copies). (Senario: the original MUA had a BCC: line directory that a number of people be copied on the message with their names all appearing to one another in the CC: headers; for each recipient domain (MX result) the MTA will only issue RCPT TO: for only the subset of addresses for which the host if the appropriate result (its own hub, smarthost, or any valid MX for the target) --- thus any subsets of recipients who share an MX with each other would see leakage in the X-Envelope-To: headers generated by MTAs that were sloppy about the handling of this detail).
Also not that an Envelope-From line would only contain a host/domain name as supplied by the HELO FROM: or EHLO FROM: directives in the SMTP exchange. It cannot be used as a return address, for replies for example.

For Perl email related stuff have a look at the Perl Email Project.

Related

Email messages: header section of an email-message, email-message envelope, email-message body and SMTP

Please, let me know whether my understanding of these things is correct:
Email-message consits of two parts: email-message envelope and contents of an email-message.
Email-message envelope is an information that is formed step by step during
SMTP-handhaking phases on a path of a message from sender's mailbox
to reciever's mailbox (during exchanging command-reply pairs between each
SMTP-client and SMTP-server on a path).
Header section is one of the two parts of a content of an email-message
(along with message body), and initially consists of the lines written by sender along with the body of the message in his email client.
Body of the email message - this thing is self-explanatory.
RFC 5322 specifies the format of the email-message content (header section and message body), and RFC 5321 specifies the work of the SMTP protocol.
Although header section of the email-message is initially formed by sender of email-message along with the body of the message (in his email-client), this header section may be further extended with some header fields containing envelope information during the path of the email-message through different SMTP-servers. For example, SMTP-server can append a new "Received:" header line to the header section. These modifications of header section must be performed according to the rules from RFC 5321 and after each of these modifications the resulting header section must me consistent with the format specified in RFC 5322.
When we open a received email-message with an email-client GUI, we see only the message body and a part of the header section of the message that was initially written by sender of email-message in his email client. But if we want to look at the full header section of the message (with those header lines containing envelope information, that were appended by SMTP servers) we can use options like "show original" in Gmail.
Since nobody answered, I will do it - I have spent some days looking for information about it, have read in RFC 5321 and RFC 5322 the most important moments and hasn't found any contradictions to the assumptions in the question.

Multiple To and Cc headers in MIME message sent through LotusScript

I'm building a LotusScript agent looping through a set of documents then - based on a given condition - create mail messages with formatted html text. The recipients will be mostly Non-Notes users (Outlook etc) that's why I want to make sure that subject and message body are formatted correctly. At least one copy is sent to a Domino mail-in database, though.
The code basically creates a MimeEntity, sets "To", "CC" and "Subject" headers then puts a pre-configured message into the mail body and sends it off.
In regards to the body I experimented both with a simple MimeEntity formatted as "text/html" as well as with a multipart message (Content-Type = "multipart/alternative") with 2 child entities (1: "text/plain" without any formatting, 2: "text/html" i.e. html-formatted); in my final code I plan to go for the latter method.
What is really weird is that the recipients (using Outlook as well as other mail clients like Thunderbird) see 3 "To:" and 3 "Cc:" items instead of just one. Looking at the doc in the receiving Domino mail-in database there is only one instance of each item (i.e. SendTo and CopyTo).
Here's the message's source code (taken from Thunderbird) showing those 3 instances of each item:
Return-Path: <sendername#myorg.de>
Received: (removed info here)
Subject: =?UTF-8?B?RWluIGdlbcO8dGxpY2hlcyBzaW1wbGVzIFRlc3RtYWlsIGF1cyBTT1A=?=
MIME-Version: 1.0
Auto-Submitted: auto-generated
To: user1#orgext1.de, user2#orgext2.de
CC: my-mail-in-db#myorg.de
To: user1#orgext1.de, user2#orgext2.de
CC: my-mail-in-db#myorg.de
To: user1#orgext1.de, user2#orgext2.de
CC: my-mail-in-db#myorg.de
Message-ID: <OFBCA50979.C1582837-ONC125856E.00548385-C125856E.0054838A#MYORG.DE>
From: Lothar Mueller <sendername#myorg.de>
This the basic code creating these mails (the simple non-multipart version):
Set docMemo = db.Createdocument()
Call docMemo.Replaceitemvalue("Form", "Memo")
Set nMimeBody = docMemo.Createmimeentity()
'SendTo
Set nMimeHead = nMimeBody.Createheader("To")
Call nMimeHead.Setheaderval("user1#otherorg.de,user2#3rdorg.de")
'CopyTo
Set nMimeHead = nMimeBody.Createheader("CC")
Call nMimeHead.Setheaderval("my-mail-in-db")
'Subject
Set nMimeHead = nMimeBody.Createheader("Subject")
Call nMimeHead.Addvaltext("Subject with ä-ö-ü-ß", "UTF-8")
'html version only for simple non-multipart MIME
Call nStream.Writetext({<p style="font-weight:bold;">Some simple formatted HTML content</p>})
Call nMimeBody.Setcontentfromtext(nStream, {text/html; charset="UTF-8"}, ENC_NONE)
Call nStream.Close()
'finally send
Call docMemo.Send(False)
Now, I can work around this behavior by simply setting the recipients as plain old Notes items, like:
Call docMemo.SendTo = recipientArray
Call docMemo.CopyTo = copyArray
instead of setting those values as MIME headers. In this case there are no more multiple instances of "To" and "CC" items at the recipients' mail clients.
I know that I did this already some years ago in a different project, and back then I didn't have those problems.
Anyone having an idea what could be the cause for this? Could it be due to the Domino version in use (now it's 10.0.1 FP4, back then it was some 9.0.1 version)?
Guess I found the cause for this, at least partially:
As I mentioned in an update to my post this behavior only can be observed when the agent is running in the client as opposed to running on the server:
examining the resulting mail through Ytria's scanEZ I find that there's a difference in regards to the fields that are created:
the run-on-server version just creates the expected fields "To:" and "Cc:" which turn up as "SendTo" and "CopyTo" in the resulting Notes document
If the code is running in the client some more fields are created in the Notes document: in addition to the standard fields there are also "INetSendTo", INetCopyTo, "AltSendTo" and "AltCopyTo". I assume that those extra fields are then rendered by the router to become addition "To:" and "Cc:" header items.
Thanks again to #DaveDelay for bringing up that idea regarding the router and mail.box

Persistence of custom headers within an email thread

I this is probably a strange question, but I thought I'd go ahead and ask. Say, I send an email, using IMAP SMTP, through a special client. This client adds a few custom headers to the email message before sending it on its way. The recipient receives this email and responds to me directly (and maybe CC's a few people as well).
My question is this: Given the above example, would these X-headers persist throughout all the new messages within the thread?
One thing I can think of is the client would be aware of the original email message it sent. All subsequent responses to this email would have a "Reply-To" header whose value equals the "Message-Id" of the previous email. I don't see why I couldn't crawl up these thread of replies until I get to the original message sent by the client, thereby deriving the original custom headers.
Maybe I'm over-thinking this. Any suggestions? :)
A message reply does not necessarily contain anything of the original message. The MUA is likely to suggest a modified (e.g. prepended with "Re:") version of the original subject, and obviously the addresses are utilised for appropriate defaults as well. None of the other content of the message forms part of the reply (unless the sender deliberately includes it, as with quoting or forwarding). Any X- headers that you have in your message will certainly not be included in the reply (unless you have control over that MUA).
However, your plan of tracking the original message is certainly feasible: see Section 3.6.4 of RFC 5322. Every message should (not must) have a Message-ID header, and should have In-Reply-To and References headers when appropriate.
The "Message-ID:" field contains a single unique message identifier. The "References:" and "In-Reply-To:" fields each contain one or more unique message identifiers, optionally separated by [whitespace].
In-Reply-To is mention to identify the message (or messages) that is (are) being replied to, while References identifies the entire thread of conversation. The References header is meant to contain the entire contents of the References header of the message being replied to, so you only need the last message to identify the entire thread.
Note that In-Reply-To and Reply-To are not the same thing (the latter specifies the address that the sender wishes replies to be sent to).
Assuming that you have the original message, then you should be able to use the References header of any reply to identify the original message. Not every MUA will handle References or In-Reply-To correctly, but most will.
As far as I know, there's no reason to think any email client would propagate any header lines it doesn't understand. Most will preserve the subject (usually adding "Re: " if necessary) and derive their "To: " and "Cc: " lines from the previous message's headers, but that's about it. I suppose some (but not all) will generate an "In-Reply-To" line, but that's as far as it goes.
Your idea of having a client crawl back through the thread looking for specific headers sounds like it might be do-able, but you'd have to write your own email client if you want that feature, and you'd still be blocked by the fact that not all email clients preserve message threading in any way.

Forwarded email detection

Is there any way to detect (using RFC 2822 headers) that an email is a forwarded email?
There are two things that are normally referred to as "forwarding".
When you set up automatic account-level forwarding to another email address, your mail system will usually introduce an extra header to enable it to detect and break mail loops. Unfortunately, the name of this header has never been standardized. Some use Delivered-To, some use X-Loop, some use X-Original-To, some use an X-header proprietary to their mail software. But there's no single header field that's present all cases.
When you manually forward a message by clicking the "Forward" button in your mailer and entering a recipient email address and some descriptive text, a new message with a new Message-ID header is generated. The set of headers on this message will be indistinguishable from a normal reply -- In-Reply-To and References are set in exactly the same way. The only difference is that the Subject header will usually start with "Fwd:" or end with "(fwd)". ("Usually" because some clients format it as "[Fwd: <original subject>]" with square brackets around the new subject, some clients localize the prefix Fwd: into their own language, and some users manually edit the Subject before hitting "send".)
So there are good hints that a message is forwarded, but no hard and fast rules.
Reading the spec, CTRL+F for "forward" gives the following header fields:
resent-date = "Resent-Date:" date-time CRLF
resent-from = "Resent-From:" mailbox-list CRLF
resent-sender = "Resent-Sender:" mailbox CRLF
resent-to = "Resent-To:" address-list CRLF
resent-cc = "Resent-Cc:" address-list CRLF
resent-bcc = "Resent-Bcc:" (address-list / [CFWS]) CRLF
resent-msg-id = "Resent-Message-ID:" msg-id CRLF
I'm not sure whether the major mail software uses these though.
EDIT
Read the spec a little too quickly, there is also this note:
Note: Reintroducing a message into the transport system and using
resent fields is a different operation from "forwarding".
"Forwarding" has two meanings: One sense of forwarding is that a mail
reading program can be told by a user to forward a copy of a message
to another person, making the forwarded message the body of the new
message. A forwarded message in this sense does not appear to have
come from the original sender, but is an entirely new message from
the forwarder of the message. On the other hand, forwarding is also
used to mean when a mail transport program gets a message and
forwards it on to a different destination for final delivery. Resent
header fields are not intended for use with either type of
forwarding.
There are no other notices of "forwarding", so there are no header fields that you can use to detect the forward, except for the subject = "Fwd: <msg>" convention.

Correct format of an Return-Path header

My application uses sendmail to send outbound email. I set the 'From:' address using the following format:
Fred Dibnah <fred#dibnah.com>
I'm also setting the Reply-To and Return-Path headers using the exact same format.
This seems to work in the vast majority of cases but I have seen at least one instance in which this fails, namely when the name part of the above string contains a period (full stop):
Fred Dibnah, Inc. <fred#dibnah.com>
This fails deep inside the TMail code (I'm using Ruby) but it seems like a perfectly valid thing to do.
My question is, should I actually be setting the Return-Path and Reply-To headers using only the email address as opposed to the above Name + Email format? E.g.
fred#dibnah.com
Thanks.
In a situation like this, it is best to turn to the RFCs.
Upon reading up on your question, it appears as if You shouldn't be setting the Return-Path value ever. The final destination SMTP server is supposed to be setting this value as it transitions the message to your mailbox (http://www.faqs.org/rfcs/rfc2821.html starting at 4.4).
According to http://www.faqs.org/rfcs/rfc2822.html the Reply-To field can have the following formats
local-part "#" domain (fred#dibnah.com for example)
display-name (Fred Dibna for example)
I would recommend using option 1 as it seems to be the most basic, and you will likely have less issues with that format. In choosing option 1, your Reply-To field should look like the following:
Reply-To: fred#dibna.com