Zend mail headers issue - malformed and 'content preview' - email

I am using zend-mail (updated very recently). I am using IMAP storage to fetch a list of messages with an inordinate (more than half) of the messages reporting a malformed header.
I have reviewed the bug described at: ZendMail - error in headers but I think I have a different problem. Unlike that error, my failure seems to be occurring around a 'content preview' line I receive in many messages.
I've added the failing line text to the error statement:
2018-01-13T11:44:46-05:00 ERR (3): Error reading message 19 - Malformed header detected Content preview: Pacific Operational Science & Technology Conference - POST
2018-01-13T11:44:46-05:00 ERR (3): #0 /var/www/book2/vendor/zendframework/zend-mime/src/Decode.php(149): Zend\Mail\Headers::fromString('Return-Path: <A...', '\r\n')
#1 /var/www/book2/vendor/zendframework/zend-mail/src/Storage/Part.php(112): Zend\Mime\Decode::splitMessage('Return-Path: <A...', 'Return-Path: <A...', '')
The source code isn't much to look at, the body of the email follows the code snippet
$mP = 1;
$mailServer = new Imap(array("host" => "someHost","user" => "someAccount","password" => "somePassword"));
$eMessage = $mailServer->getMessage($mP);
The text from the email follows:
message has been attached to this so you can view it or label
similar future email. If you have any questions, see
root\#localhost for details.
Content preview: =============================================================================
Today's topic summary =============================================================================
Group: canvas-lms-users#googlegroups.com Url: https://groups.google.com/forum/ utm_source=digest&utm_medium=email#!forum/canvas-lms-users/topics
To me, it appears that this issue has more to do with the number of blank lines being interpreted as the end of the header or something involved with the'content preview' line. I think the lines in question have been added by spam detection software. If no 'content preview' - email headers process fine.
Any help?

I believe this is a Bug in Spamassassin. The apparently empty line above the Content preview: actually contains one space. According to RFC5322 section 3.2.2 this is a MUST NOT, presumably because there is buggy software out there (and I have seen some) that treats this empty line as the separator between the Headers and the Body of the message (the correct separator is a blank line with Nothing in it).
So Spamassassin it producing emails that do not comply with the established Internet Standards, and that is a big NO-NO.
I would be interested to hear of other examples caused by this.

Related

Multiple To and Cc headers in MIME message sent through LotusScript

I'm building a LotusScript agent looping through a set of documents then - based on a given condition - create mail messages with formatted html text. The recipients will be mostly Non-Notes users (Outlook etc) that's why I want to make sure that subject and message body are formatted correctly. At least one copy is sent to a Domino mail-in database, though.
The code basically creates a MimeEntity, sets "To", "CC" and "Subject" headers then puts a pre-configured message into the mail body and sends it off.
In regards to the body I experimented both with a simple MimeEntity formatted as "text/html" as well as with a multipart message (Content-Type = "multipart/alternative") with 2 child entities (1: "text/plain" without any formatting, 2: "text/html" i.e. html-formatted); in my final code I plan to go for the latter method.
What is really weird is that the recipients (using Outlook as well as other mail clients like Thunderbird) see 3 "To:" and 3 "Cc:" items instead of just one. Looking at the doc in the receiving Domino mail-in database there is only one instance of each item (i.e. SendTo and CopyTo).
Here's the message's source code (taken from Thunderbird) showing those 3 instances of each item:
Return-Path: <sendername#myorg.de>
Received: (removed info here)
Subject: =?UTF-8?B?RWluIGdlbcO8dGxpY2hlcyBzaW1wbGVzIFRlc3RtYWlsIGF1cyBTT1A=?=
MIME-Version: 1.0
Auto-Submitted: auto-generated
To: user1#orgext1.de, user2#orgext2.de
CC: my-mail-in-db#myorg.de
To: user1#orgext1.de, user2#orgext2.de
CC: my-mail-in-db#myorg.de
To: user1#orgext1.de, user2#orgext2.de
CC: my-mail-in-db#myorg.de
Message-ID: <OFBCA50979.C1582837-ONC125856E.00548385-C125856E.0054838A#MYORG.DE>
From: Lothar Mueller <sendername#myorg.de>
This the basic code creating these mails (the simple non-multipart version):
Set docMemo = db.Createdocument()
Call docMemo.Replaceitemvalue("Form", "Memo")
Set nMimeBody = docMemo.Createmimeentity()
'SendTo
Set nMimeHead = nMimeBody.Createheader("To")
Call nMimeHead.Setheaderval("user1#otherorg.de,user2#3rdorg.de")
'CopyTo
Set nMimeHead = nMimeBody.Createheader("CC")
Call nMimeHead.Setheaderval("my-mail-in-db")
'Subject
Set nMimeHead = nMimeBody.Createheader("Subject")
Call nMimeHead.Addvaltext("Subject with ä-ö-ü-ß", "UTF-8")
'html version only for simple non-multipart MIME
Call nStream.Writetext({<p style="font-weight:bold;">Some simple formatted HTML content</p>})
Call nMimeBody.Setcontentfromtext(nStream, {text/html; charset="UTF-8"}, ENC_NONE)
Call nStream.Close()
'finally send
Call docMemo.Send(False)
Now, I can work around this behavior by simply setting the recipients as plain old Notes items, like:
Call docMemo.SendTo = recipientArray
Call docMemo.CopyTo = copyArray
instead of setting those values as MIME headers. In this case there are no more multiple instances of "To" and "CC" items at the recipients' mail clients.
I know that I did this already some years ago in a different project, and back then I didn't have those problems.
Anyone having an idea what could be the cause for this? Could it be due to the Domino version in use (now it's 10.0.1 FP4, back then it was some 9.0.1 version)?
Guess I found the cause for this, at least partially:
As I mentioned in an update to my post this behavior only can be observed when the agent is running in the client as opposed to running on the server:
examining the resulting mail through Ytria's scanEZ I find that there's a difference in regards to the fields that are created:
the run-on-server version just creates the expected fields "To:" and "Cc:" which turn up as "SendTo" and "CopyTo" in the resulting Notes document
If the code is running in the client some more fields are created in the Notes document: in addition to the standard fields there are also "INetSendTo", INetCopyTo, "AltSendTo" and "AltCopyTo". I assume that those extra fields are then rendered by the router to become addition "To:" and "Cc:" header items.
Thanks again to #DaveDelay for bringing up that idea regarding the router and mail.box

phpmailer error codes for outcome processing

I am building a mailout capability and it is working OK as far as it goes. However, I want to distinguish between various potential (high level) outcomes in order to determine what happens to each message after the current send attempt.
This must be a common requirement so I seem to be missing something pretty obvious, but I can't find anything that addresses it, either here or via Google or on PHPMailer site or .. . Possibly because there are so many questions about specific errors that I just can't find anything useful in all the other results.
At very high level:
Attempt send, and assess resulting error/result. Identify whether this message has been sent, must be retried later, or failed permanently.
- success -> update message status as 'SENT: OK'
- sent, but some issues (e.g. one recipient failed, others processed OK)-> 'SENT: some error'
- failed, due to temporary problem (e.g. connection problem, attachment open) -> 'TRY LATER'
- failed, due to message-specific problem that we should NOT try to resend-> 'FAILED: some error'
As I was unable to find an existing resource with e.g. a table of errors, I spent some time working through the phpmailerException code to try to build one myself, but it's not simple because a) they don't appear to have been designed in terms of this kind of grouping logic, b) it is not easy to uniquely identify a particular error: PHPMailer provides human-friendly messages, which are different in different languages, rather than an identifiable code - given that my solution will need to work across different language installations that's a problem!
Obviously SMTP itself provides a range of errorcodes which I could potentially use for this purpose, but how do I access these via PHPMailer? (This would work for me as I only use SMTP at this point - however, this would NOT work if other message transport like sendmail was used, so I would prefer a PHPMailer solution)
If you want individual result codes for individual address, you really need to send each message separately. If you do get errors on some recipients, they will be listed in the ErrorInfo property - look in the smtpSend function to see how the error string is assembled. I agree that it's not especially easy to parse that info out. The error messages in PHPMailer are generally more for the developer than the end user, so the translations are not that significant. You can get slightly more information about errors if you enable exceptions rather than relying only on return values.

separate email from original email using perl

When people email each other, they generally include the original email in their reply to a sender, adding a little more information each time to the email. Each email client seems to have a different way of adding the original email to a reply.
I need to parse email arriving at our mail server and try and extract the new part of the message, and I'm wondering if there is a sensible way to strip this appended (or prepended) information (the "original message") and just get the new information in a mail body? I believe sadly, that there is no encoding, the original email is simply added to the new message, but I thought I'd check with the experts?
thanks.
No, there is no simple, straightforward algorithm to separate quoted or forwarded text from new content. Quoting and forwarding are poorly standardized and different conventions have existed at different times.
Having said that, e.g. Google's Gmail succeeds fairly well in practice. With enough samples, you can clearly come up with reasonable heuristics.
Good indicators for quoted material are forwarded (pseudo-) headers and indented text, perhaps with a quote indicator along the left margin before the quoted text. You occasionally see outdents as well.
Traditionally, on Usenet in the early 1990s, people would use different, unique quoting styles.
: ~ | This seems to be the original.
: ~ This is the first reply.
: This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Around 1995, both clients and standardization initiatives by and large converged on "wedge" quotes;
> >> This seems to be the original.
> > This is the first reply.
> This is the second reply.
This is the third reply, quoting the
previous three messages in sequence.
Then along came Microsoft and ruined it all. I suppose that top quoting makes sense in some corporate settings where you quickly need to collect all the background from a thread to a new participant, but even for that purpose it's a horrible abomination.
This is the third reply, quoting the
previous three messages in sequence.
---- Begin forwarded message ----
From: Him [smtp:bogus]
To: His Friend
Subject: VS: Re: Same as on this message
Date: nothing machine-readable
This is the second reply.
---- Alkuperäinen viesti ----
Lähettäjä: His Friend [smtp:poppycock]
Saaja: Some Guy
Aihe: Re: Same as on this message
Päivämäärä: olisiko eilen ehkä
This is the first reply.
----- Original message ----
From: Somebody Else [smtp:mindless]
To: Some Guy
Subject: Same as on this message
Date: like, the day before
This seems to be the original.

What heuristics should I use to prevent an autoresponder war?

I am currently extending an e-mail system with an autoresponse feature. In a dark past, I've seen some awesome mail loops, and I'm now trying to avoid such a thing from happening to me.
I've looked at how other tools ('mailbot', 'vacation') are doing this, grepped my own mail archive for suspicious mail headers, but I wonder if there is something else I can add.
My process at this point:
Refuse if sender address is invalid (this should get rid of messages with <> sender)
Refuse if sender address matches one of the following:
'^root#',
'^hostmaster#',
'^postmaster#',
'^nobody#',
'^www#',
'-request#'
Refuse if one of these headers (after whitespace normalization and lowercasing) is present:
'^precedence: junk$',
'^precedence: bulk$',
'^precedence: list$',
'^list-id:',
'^content-type: multipart/report$',
'^x-autogenerated: reply$',
'^auto-submit: yes$',
'^subject: auto-response$'
Refuse if sender address was already seen by the autoresponder in the recent past.
Refuse if the sender address is my own address :)
Accept and send autoresponse, prepending Auto-response: to the subject, setting headers Precedence: bulk and Auto-Submit: yes to hopefully prevent some remote mailer from propagating the autoresponse any further.
Is there anything I'm missing?
In my research so far I've come up with these rules.
Treat inbound message as autogenerated, ignore it and blacklist the sender if...
Return-Path header is <> or missing/invalid
Auto-Submitted header is present with any value other than "no"
X-Auto-Response-Suppress header is present
In-Reply-To header is missing
Note: If I'm reading RFC3834 correctly, your own programs SHOULD set this, but so far it seems some autoresponders omit this (freshdesk.com)
When sending outbound messages, be sure to...
Set the Auto-Submitted: auto-generated header (or auto-replied as appropriate)
Set your SMTP MAIL FROM: command with the null address <>
Note some delivery services including Amazon SES will set their own value here, so this may not be feasible
Check the recipient against the blacklist built up by the inbound side and abort sending to known autoresponders
Consider sending not more than 1 message per unit time (long like 24 hours) to a given recipient
Notes on other answers and points
I think ignoring Precedence: list messages will cause false positives, at least for my app's configuration
I believe the OP's "auto-submit" rule is a typo and the official header is Auto-Submitted
References
RFC3834
This SO question about Precedence header has several good answers
Wikipedia Email Loop Article
desk.com article
Comments welcome and I'll update this answer as this is a good question and I'd like to see an authoritative answer created.
Update 2014-05-22
To find if an inbound message is an "out-of-office" or other automatic reply, we use that procedure:
First, Find if header "In-Reply-To" is present. If not, that is an auto-reply.
Else, check if 1 of these header is present:
X-Auto-Response-Suppress (any value)
Precedence (value contains bulk, or junk or list)
X-Webmin-Autoreply (value 1)
X-Autogenerated (value Reply)
X-AutoReply (value YES)
Include a phrase like "This is an automatically-generated response" in the body somewhere. If your message body is HTML (not plain text) you can use a style to make it not visible.
Check for this phrase before responding. If it exists, odds are good it's an automated response.

How does the email header field 'thread-index' work?

I was wondering if anyone knew how the thread-index field in email headers work?
Here's a simple chain of emails thread indexes that I messaged myself with.
Email 1 Thread-Index: AcqvbpKt7QRrdlwaRBKmERImIT9IDg==
Email 2 Thread-Index: AcqvbpjOf+21hsPgR4qZeVu9O988Eg==
Email 3 Thread-Index: Acqvbp3C811djHLbQ9eTGDmyBL925w==
Email 4 Thread-Index: AcqvbqMuifoc5OztR7ei1BLNqFSVvw==
Email 5 Thread-Index: AcqvbqfdWWuz4UwLS7arQJX7/XeUvg==
I can't seem to say with certainty how I can link these emails together. Normally, I would use the in-reply-to field or references field, but I recently found that Blackberrys do NOT include these fields. The only include Thread-Index field.
They are base64 encoded Conversation Index values. No need to reverse engineer them as they are documented by Microsoft on e.g. http://msdn.microsoft.com/en-us/library/ms528174(v=exchg.10).aspx and more detailed on http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx
Seemingly the indexes in your example doesn't represent the same conversation, which probably means that the software that sent the mails wasn't able to link them together.
EDIT: Unfortunately I don't have enough reputation to add a comment, but adamo is right that it contains a timestamp - a somewhat esoteric encoded partial FILETIME. But it also contains a GUID, so it is pretty much guarenteed to be unique for that mail (of course the same mail can exist in multiple copies).
There's a good analysis of how exactly this non-standard "Thread-Index" header appears to be used, in this post and links therefrom, including this pdf (a paper presented at the CEAS 2006 conference) and this follow-up, which includes a comment on the issue from the evolution source code (which seems to reflect substantial reverse-engineering of this undocumented header).
Executive summary: essentially, the author eventually gives up on using this header and recommends and shows a different approach, which is also implemented in the c-client library, part of the UW IMAP Toolkit open source package (which is not for IMAP only -- don't let the name fool you, it also works for POP, NNTP, local mailboxes, &c).
I wouldn't be surprised if there are mail clients out there which would not be able to link Blackberry's mails to their threads. The Thread-Index header appears to be a Microsoft extension.
Either way, Novell Evolution implements this. Take a look at this short description of how they do it, or this piece of code that finds the thread parent of a given message.
I assume that, because the lengths of the Thread-Index headers in your example are all the same, these messages were all thread starts? Strange that they're only 22-bytes, though I suppose you could try applying the 5-bytes-per-message rule to them and see if it works for you.
If you are interested in parsing the Thread-Index in C# please take a look at this post
http://forum.rebex.net/questions/3841/how-to-interprete-thread-index-header
The snippet you will find there will let you parse the Thread-Index and retrieve the Thread GUID and message DateTime. There is a problem however, it does not work for all Thread-Indexes out there. Question is why do some Thread-Indexes generate invalid DateTime and what to do to support all of them???