How to extract email message body, without signature or quoted text - email

What software can I use to process raw email text to remove the signature, quoted thread text, etc...
For example, here is an email. I would like to get just the "Thanks guys." text or more if there was more text there. I do not want the HTML signature (in the first red block) or the old emails that the person was replying to (in the second red block)

You can try Message.get_payload from email message handling package.
import email
with open('test.txt', 'r') as myfile:
data=myfile.read()
body = email.message_from_string(data)
if body.is_multipart():
for payload in body.get_payload():
print(payload.get_payload().strip())
else:
print(body.get_payload().strip())
It outputs:
this is the body text
this is the attachment text
The test.txt file contains the following.
From: John Doe <example#example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="XXXXboundary text"
This is a multipart message in MIME format.
--XXXXboundary text
Content-Type: text/plain
this is the body text
--XXXXboundary text
Content-Type: text/plain;
Content-Disposition: attachment;
filename="test.txt"
this is the attachment text
--XXXXboundary text--

Related

While connecting telnet I got errors like "556 5.7.5 Invalid RFC missing body"

I tried to verify an email address using SMTP for all mailboxes it's working fine except for Yahoo email ID. While connecting telnet I got errors like "556 5.7.5 Invalid RFC missing body".
Welcome to Stack Overflow, Jayashree.
The error message you are receiving is related to your test email not having a body part. The body part begins immediately after 2 <CRLF><CRLF> following the Content-Transfer-Encoding.
It begins at Good morning. and ends at P | 999.555.1234
Below is a complete example of a properly formatted text/plain email.
To: "Jane Doe" <jane#example1.com>
From: "John Doe" <john#example1.com>
Subject: A text/plain email
MIME-Version: 1.0 (Created with SublimeText 3)
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Good morning.
This is a message in text/plain format.
It contains text/plain.
It contains no base64 inline images or attachments.
Each line is no more than 76 characters in length.
Please reach out if you have any questions.
Thank you,
John Doe
SENIOR DEVELOPER
XYZ Solutions Inc.
P | 999.555.1234

How to send mail with text of different colors from Unix using command 'mail'?

I would like to send message from Unix server. I use command 'mail':
echo "MESSAGE_BODY" | mail -s "MESSAGE_TITLE" somebody#gmail.com
It's ok with it.
After that I want to send message with different colors. I tried this command:
echo "<font color="red">MESSAGE_BODY</font>" | mail -s "MESSAGE_TITLE" somebody#gmail.com
But it didn't help me. How to use colors ?
There have already been a "one-liner" that have posted the correct answer.
I do still feel that it's better to post how and why.
The reason why you can't just echo HTML code directly into your mail is that the receiver (Client) don't know how to display it. So it will most likely just fallback to clear text and all you would see was your HTML code when viewing the message.
What you need, is to tell the client that the content of your message is composed in HTML. You do this by adding the correct MIME header to the message.
Content-Type: text/html; charset=UTF-8
MIME-Version: 1.0
Notice you can also set charset information.
The MIME version is there for better compatibility also some SMTP servers will give you a higher spam score if you don't obey the RFC :)
But with these headers set now all "BODY" content will be treated like HTML content.
I don't just want to provide you with a "one-liner" I think showing more in a script is better to make it easier to read.
So how about this
(
echo "From: my#email.tld";
echo "To: some#email.tld";
echo "Subject: Test html mail";
echo "Content-Type: text/html";
echo "MIME-Version: 1.0";
echo "";
echo "<strong>Testing</strong><br><font color=\"blue\">I'm Blue :)</font>";
) | sendmail -t
Well technically it's still a one-liner :) But it just looks nicer and you can see what's going on!
Bonus information
If you want to have both HTML and TEXT bodies you need to look into Multipart content type bodies. I have included an example but you would properly need to read up on this if you don't know much about multipart types.
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="--0001boundary text--"
--0001boundary text--
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
The TEXT body goes here
--0001boundary text--
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
<strong>HTML code goes here</strong>
--0001boundary text--
As you can see it's no longer some simple mail body.
But I thought I wanted to show you how it was done in case you wanted to give it a go.
echo "<font color="green">Message body</font>" | mail -s "$(echo -e "Message title\nContent-Type: text/html")" somebody#gmail.com

Multiple Content-Type: text/html inside a multipart/mixed

I am trying to parse emails and received the below message for a reply using Apple Mail.
It's composed of a multipart/mixed containing an attachment and 2 html parts.
In brief:
multipart/alternative
\--> text/plain
\--> multipart/mixed
\--> text/html
\--> multipart/alternative
\--> text/html (empty)
or with the email source:
From: "John Doe" <john#example.com>
... // some headers
Content-Type: multipart/alternative;
boundary="Apple-Mail=_9331E12B-8BD2-4EC7-B53E-01F3FBEC9227"
Message-Id: <6BB1FAB2-2104-438E-9447-07AE2C8C4A92#sexample.com>
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
... // rest of headers
--Apple-Mail=_9331E12B-8BD2-4EC7-B53E-01F3FBEC9227
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
My message in text...
--Apple-Mail=_9331E12B-8BD2-4EC7-B53E-01F3FBEC9227
Content-Type: multipart/mixed;
boundary="Apple-Mail=_CA6C687E-6AA0-411E-B0FE-F0ABB4CFED1F"
--Apple-Mail=_CA6C687E-6AA0-411E-B0FE-F0ABB4CFED1F
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
charset=us-ascii
<html><head></head><body>My message in HTML...</body></html>
--Apple-Mail=_CA6C687E-6AA0-411E-B0FE-F0ABB4CFED1F
Content-Disposition: inline;
filename=myfile.pdf
Content-Type: application/pdf;
name="myfile.pdf"
Content-Transfer-Encoding: base64
... // base64 content
--Apple-Mail=_CA6C687E-6AA0-411E-B0FE-F0ABB4CFED1F
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
charset=us-ascii
<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"><base></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div><blockquote type="cite"></blockquote></div><br></body></html>
--Apple-Mail=_CA6C687E-6AA0-411E-B0FE-F0ABB4CFED1F--
--Apple-Mail=_9331E12B-8BD2-4EC7-B53E-01F3FBEC9227--
Notice the last part being an empty <blockquote type="cite"></blockquote> (is it because it's a reply?).
Is that valid two have two Content-Type: text/html in the same multipart ?
Has the last (empty) part a meaning?
Is it possible to have some actual text in additional text/html parts it or can I always ignore it when parsing? (or concatenate parts)
Thanks
According to RFC1341 (section 7.2.2)
The primary subtype for multipart, "mixed", is intended for use when
the body parts are independent and intended to be displayed serially.
So I would say, Apple's email is valid (even if I don't get the point of the empty part). The two html parts should be considered as different (in my case, best to concatenate them).

Turn off quoted-printable encoding in sendmail

i send a mail from a monitoring system to another linux box that handles and parses the mail. The mail is sent with the following headers:
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: quoted-printable
the problem is that the last line of the mail gets split and a = sign gets added:
CRIT - 93.2% used (466.06 of 500.0 GB), (levels at 80.00/90.00%), trend=
: +5.66MB / 24 hours=
do you have any idea how can i prevent that quote-printable problem so that the last line on the mail is not altered by the receiving MUA.
thank you
Mario.
quoted-printable does this by design, special characters are replaced
such that \r\l or \n are encoded.
The email client sees the mime-header
Content-Transfer-Encoding: quoted-printable
and reverses that to get back the original formatting.

Encoding DOCX to be Sent via email

I am writing a little Perl app to send emails internally where I work, but I am having troubles with attachments. Now, I can hear all of you screaming "use MIME::Lite", but it aint that easy - management rules tell me I am unable to use anything from the cpan...
Anyhow, I am using MIME::Base64 to encode any attachements I am sending. I am using Net::SMTP for the email side of things.
encoding is done as such (straight out of the Perldoc page):
my $encoded = "";
use MIME::Base64 qw(encode_base64);
open (FILE, "7857084216_9816ae9bec_b.jpg") or die "$!\n";
while (read(FILE, $buf, 60*57)) {
$encoded .= encode_base64($buf);
}
The relevant Net::SMTP code is as follows:
use Net::SMTP;
$smtp = Net::SMTP->new("mailserver",
Hello => 'somedomain.com.au',
Timeout => 60,
Debug => 0,
);
$smtp->mail(<services\#somedomain.com.au>);
$smtp->to(<me\#somedomain.com.au>);
$smtp->cc(<another\#somedomain.com.au>);
$smtp->data();
$smtp->datasend("Subject: testing 123\n");
$smtp->datasend("To: me\#somedomain.com.au\n");
$smtp->datasend("CC: another\#somedomain.com.au\n");
$smtp->datasend("MIME-Version: 1.0\n");
$smtp->datasend("Content-type: multipart/mixed;\n\tboundary=\"$boundary\"\n");
$smtp->datasend("--$boundary\n");
$smtp->datasend("Content-type: text/html; charset=utf-8\n");
$smtp->datasend("\n");
$smtp->datasend("<p> Test Email!</p>\n");
$smtp->datasend("\n");
$smtp->datasend("--$boundary\n");
$smtp->datasend("Content-type: image/jpeg; name=\"7857084216_9816ae9bec_b.jpg\"\n");
$smtp->datasend("Content-ID: \n");
$smtp->datasend("Content-Disposition: attachment; filename=\"7857084216_9816ae9bec_b.jpg\"\n");
$smtp->datasend("Content-transfer-encoding: base64\n");
$smtp->datasend("\n");
$smtp->datasend("$encoded");
$smtp->datasend("\n");
$smtp->datasend("--$boundary--\n");
$smtp->dataend;
$smtp->quit;
At this moment, I am directing these emails through to myself and reading them using Outlook 2010
When sending images (such as jpegs), I am not having any issues at all - everything seems to decode OK byte for byte as far as I can see.
When I am sending docx type files with just plain text, everything seems fine.
But, when I am sending docx files with images inserted, the files are corrupting.
Not being an expert in mail sending and attaching, I am at a bit of a loss. How should I be encoding docx files for attaching to emails? Any help would be appreciated!
Also forgot to mention, I have attempted to set the content-type accordingly: Content-type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
I've seen this happen before and usually comes down to just the incorrect mime type or mime structure. You mention you tried adding the mime type, so that might indicate that the mime structure might be a little off. I've seen this with xslx and csv files. CSV would come in find since the decoder assumes text, but if you don't have the right Mime for binary data it will always attempt to convert to ascii. I realize this around an xslx vs docx, but I think the same principles apply.
Here is a snip of an sample email I'm generating (Using Mime:Lite), but might help.
From: me#example.com
To: example#example.com
Subject: Example
Content-Transfer-Encoding: binary
Content-Type: multipart/mixed; boundary="_----------=_13433203262384120"
--_----------=_13433203262384120
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html
Content-Disposition: inline
<body>Hello,<br>
<p>EMAIL BODY</p>
<p>Thanks,<br> Blah</body>
--_----------=_13433203262384120
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; name="sample.xlsx"
Content-Disposition: attachment; filename="sample.xlsx"
Content-Transfer-Encoding: base64
UEsDBBQAAAAIAAFk+kDm/EEvVgEAACQFAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbMVUS28CIRC+
N+l/IFybXdRD0zSuHvo4tia1P4DC6BJZIAxa/fed3a32EesjmvQCgfleQ8j0h8vKsgVENN4VvJt3
OAOnvDZuWvDX8WN2wxkm6bS03kHBV4B8OLi86I9XAZAR22HBy5TCrRCoSqgk5j6Ao8rEx0omOsap
CFLN5BREr9O5Fsq7BC5lqdbgg/49TOTcJvawpOs2SQSLnN21wNqr4DIEa5RMVBcLp3+5ZJ8OOTEb
DJYm4BUBOBNbLZrSnw5r4jM9TjQa2EjG9CQrggnt1Sj6gIII+W6ZLUH9ZGIUkMa8IkoOdSINOgsk
CTEZ+Eq901z5CMe7r5+pZh9qubQC08oCntwshghSYwmQKpu3ovusE30qaNfuyQEamX2O7z7O3ryf
.... <Snipped>
ZRQyppbEwDToT079O3NfFSgofVlmcbvM34PqkQQdCmrLiWYxleokviz25OPYvpRw/s64avTwn+uh
SSg4ctedMMaTkj47g3IXX1BLAQIUAxQAAAAIAAFk+kDm/EEvVgEAACQFAAATAAAAAAAAAAEAAAC2
gQAAAABbQ29udGVudF9UeXBlc10ueG1sUEsBAhQDFAAAAAgAAWT6QPWzn3yRAQAAQAMAABAAAAAA
AAAAAQAAALaBhwEAAGRvY1Byb3BzL2FwcC54bWxQSwECFAMUAAAACAABZPpAkg6xSFoBAAC2AgAA
EQAAAAAAAAABAAAAtoFGAwAAZG9jUHJvcHMvY29yZS54bWxQSwECFAMUAAAACAABZPpAyfySzFpy
BAAcqRgAFAAAAAAAAAABAAAAtoHPBAAAeGwvc2hhcmVkU3RyaW5ncy54bWxQSwECFAMUAAAACAAB
ZPpARQtWMQgCAABaBQAADQAAAAAAAAABAAAAtoFbdwQAeGwvc3R5bGVzLnhtbFBLAQIUAxQAAAAI
AABk+kCVKtbcsAEAAFgDAAAPAAAAAAAAAAEAAAC2gY55BAB4bC93b3JrYm9vay54bWxQSwECFAMU
AAAACAABZPpA8XOrpLUFAABVGwAAEwAAAAAAAAABAAAAtoFrewQAeGwvdGhlbWUvdGhlbWUxLnht
bFBLAQIUAxQAAAAIAO1j+kD+7E73UA8AAM1zAAAYAAAAAAAAAAEAAAC2gVGBBAB4bC93b3Jrc2hl
ZXRzL3NoZWV0MS54bWxQSwECFAMUAAAACAAAZPpA7/olcLO/MgAHxsABGAAAAAAAAAABAAAAtoHX
kAQAeGwvd29ya3NoZWV0cy9zaGVldDIueG1sUEsBAhQDFAAAAAgAAWT6QCFo34b0AAAATgMAABoA
AAAAAAAAAQAAALaBwFA3AHhsL19yZWxzL3dvcmtib29rLnhtbC5yZWxzUEsBAhQDFAAAAAgAAWT6
QIB6g4TsAAAAUQIAAAsAAAAAAAAAAQAAALaB7FE3AF9yZWxzLy5yZWxzUEsFBgAAAAALAAsAxgIA
AAFTNwAAAA==
--_----------=_13433203262384120--
I have found the issue and it does appear that the 'content-type' does seem to make a difference. According to what I found, I replaced all the Content-type calls, that include Base64 encoded data with the following:
Content-type: application/octet-stream;
This seemed to have sent it straight through, no issues with either Outlook 2010 and 2003. I have tested it against binary files, plain text, ect and all seems to work OK in my current environment.
Cheers