arghhh, it's not easy. I'm trying to parse some mails with perl. Let's take an example:
From: abc#def.de
Content-Type: multipart/mixed;
boundary="----_=_NextPart_001_01CBE273.65A0E7AA"
To: ghi#def.de
This is a multi-part message in MIME format.
------_=_NextPart_001_01CBE273.65A0E7AA
Content-Type: multipart/alternative;
boundary="----_=_NextPart_002_01CBE273.65A0E7AA"
------_=_NextPart_002_01CBE273.65A0E7AA
Content-Type: text/plain;
charset="UTF-8"
Content-Transfer-Encoding: base64
[base64-content]
------_=_NextPart_002_01CBE273.65A0E7AA
Content-Type: text/html;
charset="UTF-8"
Content-Transfer-Encoding: base64
[base64-content]
------_=_NextPart_002_01CBE273.65A0E7AA--
------_=_NextPart_001_01CBE273.65A0E7AA
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----_=_NextPart_003_01CBE272.13692C80"
From: bla#bla.de
To: xxx#xxx.de
This is a multi-part message in MIME format.
------_=_NextPart_003_01CBE272.13692C80
Content-Type: multipart/alternative;
boundary="----_=_NextPart_004_01CBE272.13692C80"
------_=_NextPart_004_01CBE272.13692C80
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
=20
Viele Gr=FC=DFe
------_=_NextPart_004_01CBE272.13692C80
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html>...</html>
------_=_NextPart_004_01CBE272.13692C80--
------_=_NextPart_003_01CBE272.13692C80
Content-Type: application/x-zip-compressed;
name="abc.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="abc.zip"
[base64-content]
------_=_NextPart_003_01CBE272.13692C80--
------_=_NextPart_001_01CBE273.65A0E7AA--
This mail is sent from Outlook with another attached message. As you can see, this is a very complex mail with many different content types (text/plain, text/html, message/rfc_822, application/xyz)...
And the rfc_822 part is the problem. I've written a script in Perl 5.8 (Debian Squeeze) to parse this message with MIME::Parser.
use MIME::Parser;
my $parser = MIME::Parser->new;
$parser->output_to_core(1);
my $top_entity = $parser->parse(\*STDIN);
my $plain_body = "";
my $html_body = "";
my $content_type;
foreach my $part ($top_entity->parts_DFS) {
$content_type = $part->effective_type;
$body = $part->bodyhandle;
if ($body) {
if ($content_type eq 'text/plain') {
$plain_body = $plain_body . "\n" if ($plain_body ne '');
$plain_body = $plain_body . $body->as_string;
} elsif ($content_type eq 'text/html') {
$html_body = $html_body . "\n" if ($html_body ne '');
$html_body = $html_body . $body->as_string;
}
}
}
# parsing of attachment comes later
print $plain_body;
The first message part (base64-content) contains german umlauts, which are shown correctly at STDOUT. The nested rfc_822 message is parsed by MIME::Parser automatically and is pooled with the top level body as one entity. This nested rfc_822 contains also german umlauts in quoted-printable as you can see. But these are not shown correctly at STDOUT. When doing a
utf8::encode($plain_body);
before print, the quoted-printable umlauts are shown correctly, but not the base64 encoded ones. I'm trying now for hours to extract the rfc_822 seperatly and doing some encoding, but nothing helps. Who else can help?
Regards
Assuming that your console displays UTF-8, this make sense.
It correctly shows what you have decoded, but, of course, latin1 characters are not shown correctly.
Later, you do a conversion to UTF-8, but this does not make sense if the data is already UTF8. So only the former latin1 umlauts are shown.
There is no way to get this right without looking at the "charset" in the content-type and acting accordingly.
Related
I have a procedure to send mails as this:
sub SendMail {
my $subject = shift;
my #message = #_;
my $sender;
my $MIME_BOUNDARY = '====Multipart.Boundary.689464861147414354====';
my $now = strftime("%Y-%m-%d %H:%M:%S", localtime);
my #addresses = split(",", $ENV{ADMIN_MAIL});
my $sender = $ENV{USER} || $ENV{USERNAME};
$sender .= "\#" . hostname();
my $smtp = Net::SMTP->new($ENV{MAILHOST} || 'mailhost', Debug => 1);
unless ( $smtp ) {
die "Error while sending notification mail. Not connected to SMTP server.";
}
$smtp->mail( $addresses[0] );
$smtp->recipient( #addresses );
$smtp->data;
$smtp->datasend("From: $sender\n");
$smtp->datasend("To: " . join(",", #addresses) . "\n");
$smtp->datasend("Subject: $subject\n");
$smtp->datasend("Date: " . strftime("%a, %d %b %Y %H:%M:%S %z", localtime) . "\n");
if ( #log_messages ) {
$smtp->datasend("Mime-Version: 1.0\n");
$smtp->datasend("Content-Type: multipart/mixed; boundary=\"$MIME_BOUNDARY\"\n");
$smtp->datasend("This is a multipart message in MIME format.\n");
$smtp->datasend("--$MIME_BOUNDARY\n");
}
$smtp->datasend("Content-type: text/plain; charset=UTF-8\n");
$smtp->datasend("Content-Disposition: quoted-printable\n");
$smtp->datasend("\n");
foreach ( #message ) { $smtp->datasend("$_\n") }
$smtp->datasend("\n\n");
$smtp->datasend("Message from " . hostname() . " (PID=$$) sent by 'LogDumper.pl' at $now");
$smtp->datasend("\n");
if ( #log_messages ) {
$smtp->datasend("\n");
$smtp->datasend("--$MIME_BOUNDARY\n");
$smtp->datasend("Content-Type: text/plain; name=\"logs.txt\"\n");
$smtp->datasend("Content-Disposition: attachment; filename=\"logs.txt\"\n");
$smtp->datasend("\n");
foreach ( #log_messages ) { $smtp->datasend("$_\n") }
$smtp->datasend("\n");
$smtp->datasend("--$MIME_BOUNDARY--\n");
}
$smtp->dataend;
$smtp->quit;
}
The procedure works fine with plain text mails, i.e. empty #log_messages. However, if I try to attach a text file
my #log_messages;
push #log_messages, "Line 1";
push #log_messages, "Line 2";
SendMail("The Subject", "The Message");
then the mail is not sent.
Debug output is this:
Net::SMTP=GLOB(0x7a78a40)<<< 354 Start mail input; end with <CRLF>.<CRLF>
Net::SMTP=GLOB(0x7a78a40)>>> From: Domscheit#xxxxx
Net::SMTP=GLOB(0x7a78a40)>>> To: wernfried.domscheit#xxxxx.xxx
Net::SMTP=GLOB(0x7a78a40)>>> Subject: The Suject
Net::SMTP=GLOB(0x7a78a40)>>> Date: Mo, 01 Okt 2018 10:15:57 W. Europe Daylight Time
Net::SMTP=GLOB(0x7a78a40)>>> Mime-Version: 1.0
Net::SMTP=GLOB(0x7a78a40)>>> Content-Type: multipart/mixed; boundary="====Multipart.Boundary.689464861147414354===="
Net::SMTP=GLOB(0x7a78a40)>>> This is a multipart message in MIME format.
Net::SMTP=GLOB(0x7a78a40)>>> --====Multipart.Boundary.689464861147414354====
Net::SMTP=GLOB(0x7a78a40)>>> Content-type: text/plain; charset=UTF-8
Net::SMTP=GLOB(0x7a78a40)>>> Content-Disposition: quoted-printable
Net::SMTP=GLOB(0x7a78a40)>>> The Message
Net::SMTP=GLOB(0x7a78a40)>>> Message from xxxxx (PID=8072) sent by 'LogDumper.pl' at 2018-10-01 10:15:54
Net::SMTP=GLOB(0x7a78a40)>>> --====Multipart.Boundary.689464861147414354====
Net::SMTP=GLOB(0x7a78a40)>>> Content-Type: text/plain; name="logs.txt"
Net::SMTP=GLOB(0x7a78a40)>>> Content-Disposition: attachment; filename="logs.txt"
Net::SMTP=GLOB(0x7a78a40)>>> Line 1
Net::SMTP=GLOB(0x7a78a40)>>> Line 2
Net::SMTP=GLOB(0x7a78a40)>>> --====Multipart.Boundary.689464861147414354====--
Net::SMTP=GLOB(0x7a78a40)>>> .
Net::SMTP: Unexpected EOF on command channel at C:\Developing\Source\LogDumper.pl line 1271.
Apparently there is a missing or needless \n. I tried to put it almost everywhere but I don't get it working.
Update:
Actually it is working when I execute the script on Linux but not on my Windows development PC.
I think the problem is solved.
It works on production server at Linux
It works on development Windows PC when I am working in the office
It does not work on Windows when I am working at home and connected via VPN - nasty security department...
Well, it is not really solved but for me it is working fine. I did not test (yet) whether it fails over VPN also for one of the higher-level packages.
I want to post data with content type multipart/form-data:
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->post (
'http://127.0.0.1:12555',
'Content-Type' => 'form-data',
Content => {
'data1' => rand,
'data2' => rand,
}
);
And i tested the submitted data:
use strict;
use warnings;
use IO::Socket::INET;
use Socket qw ( inet_aton );
my $sock_listen = new IO::Socket::INET (
LocalHost => '127.0.0.1',
LocalPort => '12555',
Proto => 'tcp',
Listen => 3,
Reuse => 1,
);
$sock_listen->autoflush ();
my $sock;
while ( $sock = $sock_listen->accept ( ) )
{
my $data = '';
$sock->recv ( $data, 4096 );
print $data . "\n";
}
Test #1 result:
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: 127.0.0.1:12555
User-Agent: libwww-perl/6.05
Content-Length: 162
Content-Type: multipart/form-data; boundary=xYzZY
--xYzZY
Content-Disposition: form-data; name="data2"
0.876556396484375
--xYzZY
Content-Disposition: form-data; name="data1"
0.62921142578125
--xYzZY--
Test #2 result:
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: 127.0.0.1:12555
User-Agent: libwww-perl/6.05
Content-Length: 163
Content-Type: multipart/form-data; boundary=xYzZY
--xYzZY
Content-Disposition: form-data; name="data2"
0.896942138671875
--xYzZY
Content-Disposition: form-data; name="data1"
0.041656494140625
--xYzZY--
I added a data:
'data3' => '--xYzZY'
and got:
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: 127.0.0.1:12555
User-Agent: libwww-perl/6.05
Content-Length: 221
Content-Type: multipart/form-data; boundary=Tegj
--Tegj
Content-Disposition: form-data; name="data2"
0.34613037109375
--Tegj
Content-Disposition: form-data; name="data3"
--xYzZY
--Tegj
Content-Disposition: form-data; name="data1"
0.678955078125
--Tegj--
The question is how i can set the boundary manually to 32 chars string like browser's ----WebKitFormBoundary[...] using LWP?
Or can just use IO::Socket?
LWP allows you set the boundary manually when you do multipart/form-data requests. This feature is unfortunately not documented at all.
However, you have to do multipart explicitly. You can set your own boundary by appending the boundary as an additional field of the Content-Type. It will be converted to a header appropriately by HTTP::Request::Common.
my $ua = LWP::UserAgent->new;
$ua->post(
'http://127.0.0.1:12555',
'Content-Type' =>
'multipart/form-data;boundary=Nobody-has-the-intention-to-erect-a-wall',
# ^^^^^^^ ^^^^^^^^
Content => {
data1 => rand,
data2 => rand,
},
);
With your listener, this result in the following output.
POST / HTTP/1.1
TE: deflate,gzip;q=0.3
Connection: TE, close
Host: 127.0.0.1:12555
User-Agent: libwww-perl/6.15
Content-Length: 269
Content-Type: multipart/form-data; boundary=Nobody-has-the-intention-to-erect-a-wall
--Nobody-has-the-intention-to-erect-a-wall
Content-Disposition: form-data; name="data2"
0.0575856828104122
--Nobody-has-the-intention-to-erect-a-wall
Content-Disposition: form-data; name="data1"
0.677908250902878
--Nobody-has-the-intention-to-erect-a-wall--
Note that HTTP::Request::Common will replace your boundary with a random string if it finds the boundary string in the body of any of the parts. It will not just add a number to your boundary.
The sole purpose of the boundary is to separate the message parts and the only requirement on it is that it doesn't appear anywhere in the message. I don't see a good reason to attempt to set it to be "the same" as anything else. Besides, no tool guarantees that it will always use the same one.
More importantly, setting it to a fixed string (without regard for the message) is dangerous: how does anyone know that such a string may not be in a message?
Finally, I don't think it is possible to do so, precisely because the boundary must be checked to ensure that it indeed isn't in the message; so no tool should provide a way to set it to a predefined string.
Have a look at HTTP::Request::Common's source. See how the sub boundary() badly mangles the string to return, and how much work goes into the boundary elsewhere. Then CHECK_BOUNDARY: block changes it further if it isn't good enough. This is clearly not meant to be set outside.
The post method of LWP::UserAgent exists as a shortcut for this module's one.
Note that simbabque found a way to set the boundary, which then also undergoes the checks.
I'm a perl novice trying to figure out how to decode a MIME-encoded email with multiple parts. I'm not sure of conventions, so I'll just include the pieces of the email that I believe are relevant:
Content-Type: multipart/mixed; boundary="===============3385789078715843912=="
Mime-Version: 1.0
--===============3385789078715843912==
Content-Type: multipart/signed; micalg="pgp-sha256";
protocol="application/pgp-signature"; boundary="=-0+dmFxz+BsFOEAAxvudu"
--=-0+dmFxz+BsFOEAAxvudu
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: base64
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT0KVWJ1bnR1IFNlY3VyaXR5IE5vdGljZSBVU04tMzIxMC0xCkZlYnJ1
YXJ5IDIzLCAyMDE3CgpMaWJyZU9mZmljZSB2dWxuZXJhYmlsaXR5Cj09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
CgpBIHNlY3VyaXR5IGlzc3VlIGFmZmVjdHMgdGhlc2UgcmVsZWFzZXMgb2YgVWJ1bnR1IGFuZCBp
dHMgZGVyaXZhdGl2ZXM6CgotIFVidW50dSAxNi4wNCBMVFMKLSBVYnVudHUgMTQuMDQgTFRTCi0g
I've got the following bit of code:
my $msg = Email::MIME->new($buf);
for my $part ($msg->parts) {
if ($part->content_type =~ m!multipart/mixed!
or $part->content_type eq '' )
{
print "Found Multipart";
for my $subpart ($part->parts) {
print $subpart->body;
}
}
}
I really don't know what to do next. I've had a dozen different variations on this, and haven't gotten any closer after four hours of working on it. I'd appreciate if someone could help me identify the proper perl modules and functions to be used to read this text sub-part of a signed email.
The documentation of Email::MIME suggests not to use parts, because it's a stupid method. It returns its own object if there are no parts. That is weird.
Instead use the subparts method to get the parts of the email. Then use it again to iterate all parts of that part. If there are any, it will go in. Print the body of that sub part and you're done.
foreach my $part ( $msg->subparts ) {
foreach my $sub_part ($part->subparts) {
print $sub_part->body;
}
}
I am trying to change the body text of a part inside a multi-part MIME-Email using Email::MIMEs (1.926) walk_parts and body_set.
The change is there but when sending the mail the old/non-changed mail text is being sent.
The question is: What do I have to do to 'activate' my changes?
See:
use Email::MIME;
my $raw_message_text = q!Date: Wed, 26 Feb 2014 08:02:39 +0100
From: Me <me#example.com>
To: You <you#example.com>
Subject: test
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="------------010309070301040606000908"
This is a multi-part message in MIME format.
--------------010309070301040606000908
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
this is a test
--------------010309070301040606000908
Content-Type: text/plain;
name="file-to-attach.txt"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="file-to-attach.txt"
dGV4dCBpbnNpZGUgYXR0YWNobWVudAoK
--------------010309070301040606000908--
!;
my $parsed_email = Email::MIME->new($raw_message_text);
$parsed_email->walk_parts(sub {
my ($part) = #_;
return if $part->subparts; # multipart
return unless ($part->content_type =~ /text\/plain.*charset=utf-8/i);
$part->body_set("new body text");
});
print "As you see the change is there:\n";
$parsed_email->walk_parts(sub {
my ($part) = #_;
return if $part->subparts; # multipart
my $body = $part->body;
print "Body:$body\n";
});
print "But the email object itself doesn't notice that:\n\n";
print $parsed_email->as_string;
This will first show the changed body text, so you see it is there! But when the whole mail is shown the old body text is used. The same will happen if I just send the email using Email::Sender. So I wonder what the correct usage of body_set is...
I stumbled upon this problem as well. Eventually I realized all that's missing from the original poster, is just the following:
my #new_parts = $parsed_email->parts;
$parsed_email->parts_set( \#new_parts );
add the above before the final as_string call, and you are good.
The walk_parts doesn't seem to work properly i had to use the old classic method, i'm not sure if there is something broken with the new version but with this method it works, you'll just need to replace your code somehow:
This solution isn't efficient at all and i know its heavy on the memory but i'm lazy, i think i should look for another library with this model.
my #parts = $parsed->subparts;
my #new_parts;
if (#parts) {
foreach (#parts)
{
my $part = $_;
print $part->content_type."\r\n";
if ($part->content_type =~ /text\/plain.*charset=utf-8/i) {
$part->body_set("new body text");
push #new_parts, $part;
} else {
push #new_parts, $part;
}
}
} else {
print 'single part';#to replace for single mime
}
$parsed->parts_set(\#new_parts);
I'm sending a multipart HTML email using PHP's mail() function. In my Postfix configuration I have my SMTP server set to Amazon's SES. Here is the PHP for sending the email:
$boundary = uniqid("HTMLDEMO");
$headers = "From: me#mydomain.com\r\n";
$headers .= "MIME-Version: 1.0\r\n";
$headers .= "Content-Type: multipart/alternative; boundary = ".$boundary."\r\n\r\n";
// plain text
$content = "--".$boundary."\r\n" .
"Content-Type: text/plain; charset=ISO-8859-1\r\n" .
"Content-Transfer-Encoding: base64\r\n\r\n" .
chunk_split(base64_encode($plaintext_message));
// HTML
$content .= "--".$boundary."\r\n" .
"Content-Type: text/html; charset=ISO-8859-1\r\n" .
"Content-Transfer-Encoding: text/html \r\n\r\n" .
"<html><body>".$html_message."</body></html>";
//send message
mail($to, $subject, $content, $headers);
When I echo the message content, this is what I see in the browser:
--HTMLDEMO527d8d851e72f
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: base64
VGhpcyBpcyB0aGUgcGxhaW4gdGV4dCB2ZXJzaW9uIQ==
--HTMLDEMO527d8d851e72f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: text/html
<html><body><p>My message here.</p></body></html>
But when I view the message source in Gmail, I now see this (including the message headers):
From: me#mydomain.com
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary = HTMLDEMO527d8d851e72f
Message-ID: <blah-blah-blah#email.amazonses.com>
Date: Sat, 9 Nov 2013 01:19:02 +0000
X-SES-Outgoing: 2013.11.09-12.34.5.67
--HTMLDEMO527d8d851e72f
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: base64
VGhpcyBpcyB0aGUgcGxhaW4gdGV4dCB2ZXJzaW9uIQ==
--HTMLDEMO527d8d851e72f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: text/html
The multipart headers are now double-spaced, causing the HTML to display as plain text. SES is clearly modifying the message headers (it added Message-ID, Date, and X-SES-Outgoing), so could that also be the culprit for the extra spaces in the multipart headers? When I send an identical email from a non-Amazon server, it comes through normally and renders the HTML like it should.
Also, when I send it as a simple HTML email (not multipart), then it works just fine.
Thanks.
I had got the same issue and I resolved it by changing the end of line character to '\n' instead of '\r\n'.