LWP Get Large File Download Headers Missing - perl

This post is follow on work related to LWP GET large file download. That post was regarding an error from LWP when trying to pass arguments in the header incorrectly. Now I am posting the changes I made and how I am trying to debug the approach. This discussion should be very informative for those interested in POST vs GET header formation, and what the server receives while using the CGI package. It is not information easily found on the net.
Here is my client code snip:
my $bytes_received = 0; # vars used below are set prior to this point
my $filename = $opt{t}."/$srcfile";
open (FH, ">", "$filename") or $logger->error( "Couldn't open $filename for writing: $!" );
my $ua = LWP::UserAgent->new();
my $target = $srcfile;
my $res = $ua->get(
$url,
':content_cb' => \&callback,
'api' => 'olfs', # Note attempted use of different types of quotes had no impact
"cmd" => 'rfile',
"target" => $target,
"bs" => $bs
);
print $logger->info("$bytes_received bytes received");
sub callback{
my($chunk, $res) = #_;
$bytes_received += length($chunk);
print FH $chunk;
}
Here is the server snip (cgi script):
my $query = new CGI;
my $rcvd_data = Dumper($query);
print $rcvd_data;
Here is the output from a GET:
$VAR1 = bless( {
'.parameters' => [],
'use_tempfile' => 1,
'.charset' => 'ISO-8859-1',
'.fieldnames' => {},
'param' => {},
'.header_printed' => 1,
'escape' => 1
}, 'CGI' );
Here is a client with a POST request:
my $ua = new LWP::UserAgent();
local $HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1;
my $req =
POST
$url,
'Content_Type' => 'form-data',
'Content' => {
"api" => 'olfs',
"cmd" => 'wfile',
"target" => $target,
"tsize" => $file_size,
"bs" => $bs,
"filename" => [ $file ] };
# HTTP::Message calls set_content, which appears to set the subroutine for content
# LWP::UserAgent
# LWP::Protocol::file::request sends content in chunks
#
$req->content( $req->content() );
$logger->info("Uploading: $file");
my $resp = $ua->request($req);
Here is the output on the server, just like before but now from the POST:
'.parameters' => [
'cmd',
'bs',
'api',
'target',
'filename',
'tsize'
],
'use_tempfile' => 1,
'.tmpfiles' => {
'*Fh::fh00001random23' => {
'info' => {
'Content-Type' => 'text/plain',
'Content-Disposition' => 'form-data; name="filename"; filename="random23"'
},
'name' => bless( do{\(my $o = '/usr/tmp/CGItemp33113')}, 'CGITempFile' ),
'hndl' => bless( \*Fh::fh00001random23, 'Fh' )
}
},
'.charset' => 'ISO-8859-1',
'.fieldnames' => {},
'param' => {
'cmd' => [
'wfile'
],
'bs' => [
'buffer1'
],
'api' => [
'olfs'
],
'target' => [
'random23'
],
'tsize' => [
'1073741824'
],
'filename' => [
$VAR1->{'.tmpfiles'}{'*Fh::fh00001random23'}{'hndl'}
},
'escape' => 1,
'.header_printed' => 1
}, 'CGI' );
In short, you can see in the POST dump the "key" / "value" pairs, ie "target => random23". In the GET dump I do not find any keys or values from what I submitted on the client side. Can that be explained, or what do I need to do so as to extract key / value pairs in the CGI script?

You're passing your form variables as HTTP headers.
Like I previously mentioned, if you want to build a url, you can use URI.
$url = URI->new($url);
$url->query_form(
api => 'olfs',
cmd => 'rfile',
target => $target,
bs => $bs,
);

Related

How to send serialized post with Plack::Test?

Writing a REST application with perl Dancer2. I set the serializer setting to the format in code.
set serializer => 'JSON';
I wrote a test file to rest the application, but failure in POST.
REST application got KEY but null value.
DBD::Pg::st execute failed: ERROR: null value in column "email" of relation "owners" violates not-null constraint
How to set serialized format content in Plack::Test?
use strict;
use warnings;
use Test::More;
use Test::Deep;
use Plack::Test;
use Plack::Util;
use HTTP::Request::Common;
use JSON::MaybeXS qw(decode_json encode_json);
use Data::Dumper qw(Dumper);
use Storable qw(freeze thaw);
use utf8;
use MyApp;
my %data = (
password => 'A12345678',
email => 'test#test.com'
);
# APP Start
my $app = MyApp->to_app;
my $test = Plack::Test->create($app);
subtest register => sub {
print ">>> Test <<<\n";
my $datas = {
password => $data{password},
email => $data{email},
};
my $serialized_data = freeze($datas);
my $res = $test->request( POST '/api/v1/register', $serialized_data );
print Dumper $res;
};
done_testing();
Dumper $res =>
$VAR1 = bless( {
'_headers' => bless( {
'content-type' => 'application/json',
'server' => 'Perl Dancer2 0.400000',
'content-length' => 454
}, 'HTTP::Headers' ),
'_request' => bless( {
'_headers' => bless( {
'content-length' => 0,
'
12345678
test1#test.comemail
a1234567password' => undef,
'content-type' => 'application/x-www-form-urlencoded',
'::std_case' => {
'
12345678
test1#test.comemail
a1234567password' => '
12345678
Test1#Test.ComEmail
A1234567Password'
}
}, 'HTTP::Headers' ),
I tested this REST API with Postman is fine.

Data::Dumper::Freezer proper usage

I'm trying to log data structures in an old and big Perl project. In order to do so, I use Data::Dumper, however, some structures are a bit too large and spam the log. So I'm looking for a way to log them in a less verbose manner.
Now Dumper's doc mentions $Data::Dumper::Freezer = <method_name> variable that can be used to fix that. I've tried using that.
Adding a serializer method that returns "shortened" value results in nothing, though the method gets called. Making the serializer method act on $_[0] causes the needed effect, but spoils the original data structure.
I'm confused. What am I doing wrong? How can I fix it?
Here's a refined sample code:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Indent = 0;
$\="\n";
my $x = Foo->new ( answer => 42, use => "force" );
my $y = { foo => $x };
print "initial plain:\n\t", Dumper( $x );
print "initial compound:\n\t", Dumper( $y );
{
local $Data::Dumper::Freezer = 'freeze_pure';
print "still not abbreviated data:\n\t", Dumper( $y );
};
{
local $Data::Dumper::Freezer = 'freeze_replace';
print "abbreviated data:\n\t", Dumper( $y );
};
print "initial data is still intact:\n\t", Dumper( $x );
print "compound data is corrupted:\n\t", Dumper( $y );
package Foo;
sub new {
my $class = shift;
return bless { #_ }, $class;
};
sub freeze_pure {
my $self = $_[0];
warn "# In freeze_pure";
return bless {
values => join ",", values %$self
}, (ref $self) . "::short";
};
sub freeze_replace {
my $self = $_[0];
warn "# In freeze_replace";
$_[0] = bless {
values => join ",", values %$self
}, (ref $self) . "::short";
return;
};
And output:
initial plain:
$VAR1 = bless( {'use' => 'force','answer' => 42}, 'Foo' );
initial compound:
$VAR1 = {'foo' => bless( {'use' => 'force','answer' => 42}, 'Foo' )};
# In freeze_pure at dumper-freezer.pl line 36.
still not abbreviated data:
$VAR1 = {'foo' => bless( {'use' => 'force','answer' => 42}, 'Foo' )};
# In freeze_replace at dumper-freezer.pl line 42.
abbreviated data:
$VAR1 = {'foo' => bless( {'values' => 'force,42'}, 'Foo::short' )};
initial data is still intact:
$VAR1 = bless( {'use' => 'force','answer' => 42}, 'Foo' );
compound data is corrupted:
$VAR1 = {'foo' => bless( {'values' => 'force,42'}, 'Foo::short' )};
Although the documentation is a bit sparse, the intended use of freezer/toaster is data serialization/de-serialization, not prettification of debugging output.
So, Data::Dumper calls the freezer method, but doesn't use the return value. The idea is probably that if you're going to serialize an object, you won't be messing with it again until you de-serialize it, so there's no problem with changing the object itself.
Here's the relevant section of code from the Data::Dumper source:
# Call the freezer method if it's specified and the object has the
# method. Trap errors and warn() instead of die()ing, like the XS
# implementation.
my $freezer = $s->{freezer};
if ($freezer and UNIVERSAL::can($val, $freezer)) {
eval { $val->$freezer() };
warn "WARNING(Freezer method call failed): $#" if $#;
}
If you just want to reduce the size of the output in your logs, you can remove newlines and indentation by setting $Data::Dumper::Indent to zero:
use Data::Dumper;
use WWW::Mechanize;
$Data::Dumper::Indent = 0;
my $mech = WWW::Mechanize->new;
print Dumper $mech;
Output:
$VAR1 = bless( {'headers' => {},'ssl_opts' => {'verify_hostname' => 1},'forms' => undef,'page_stack' => [],'text' => undef,'requests_redirectable' => ['GET','HEAD','POST'],'timeout' => 180,'onerror' => sub { "DUMMY" },'current_form' => undef,'links' => undef,'max_redirect' => 7,'quiet' => 0,'images' => undef,'noproxy' => 0,'stack_depth' => 8675309,'show_progress' => undef,'protocols_forbidden' => undef,'no_proxy' => [],'handlers' => {'request_prepare' => bless( [{'owner' => 'LWP::UserAgent::cookie_jar','callback' => sub { "DUMMY" },'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:705'}], 'HTTP::Config' ),'response_header' => bless( [{'owner' => 'LWP::UserAgent::parse_head','callback' => sub { "DUMMY" },'m_media_type' => 'html','line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:684'}], 'HTTP::Config' ),'response_done' => bless( [{'owner' => 'LWP::UserAgent::cookie_jar','callback' => sub { "DUMMY" },'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:708'}], 'HTTP::Config' )},'onwarn' => sub { "DUMMY" },'protocols_allowed' => undef,'use_eval' => 1,'local_address' => undef,'autocheck' => 1,'title' => undef,'def_headers' => bless( {'user-agent' => 'WWW-Mechanize/1.75'}, 'HTTP::Headers' ),'cookie_jar' => bless( {'COOKIES' => {}}, 'HTTP::Cookies' ),'proxy' => {},'max_size' => undef}, 'WWW::Mechanize' );
This is still a lot of output, but it's certainly more compact than:
$VAR1 = bless( {
'headers' => {},
'ssl_opts' => {
'verify_hostname' => 1
},
'forms' => undef,
'page_stack' => [],
'text' => undef,
'requests_redirectable' => [
'GET',
'HEAD',
'POST'
],
'timeout' => 180,
'onerror' => sub { "DUMMY" },
'current_form' => undef,
'links' => undef,
'max_redirect' => 7,
'quiet' => 0,
'images' => undef,
'noproxy' => 0,
'stack_depth' => 8675309,
'show_progress' => undef,
'protocols_forbidden' => undef,
'no_proxy' => [],
'handlers' => {
'request_prepare' => bless( [
{
'owner' => 'LWP::UserAgent::cookie_jar',
'callback' => sub { "DUMMY" },
'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:705'
}
], 'HTTP::Config' ),
'response_header' => bless( [
{
'owner' => 'LWP::UserAgent::parse_head',
'callback' => sub { "DUMMY" },
'm_media_type' => 'html',
'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:684'
}
], 'HTTP::Config' ),
'response_done' => bless( [
{
'owner' => 'LWP::UserAgent::cookie_jar',
'callback' => sub { "DUMMY" },
'line' => '/home/foo/perl5/perlbrew/perls/perl-5.16.3/lib/site_perl/5.16.3/LWP/UserAgent.pm:708'
}
], 'HTTP::Config' )
},
'onwarn' => sub { "DUMMY" },
'protocols_allowed' => undef,
'use_eval' => 1,
'local_address' => undef,
'autocheck' => 1,
'title' => undef,
'def_headers' => bless( {
'user-agent' => 'WWW-Mechanize/1.75'
}, 'HTTP::Headers' ),
'cookie_jar' => bless( {
'COOKIES' => {}
}, 'HTTP::Cookies' ),
'proxy' => {},
'max_size' => undef
}, 'WWW::Mechanize' );
Alternatively, you could try Data::Dump, which allows you to filter the output using Data::Dump::Filtered. I prefer Data::Dump to Data::Dumper anyway because I think it has more sensible defaults (e.g. outputting escape sequences for whitespace other than spaces).
I haven't used the filtering feature yet, but brian d foy wrote a nice article about it with several examples.

Facebook status update using WWW::Mechanize

I am trying to update status on facebook using Mechanize.I am able to login using the script but unable to update.I verified the id of form for status update is "u_0_w".
But selecting the form_id method says "There is no form with ID "u_0_w"".
My script is this:
use WWW::Mechanize;
use strict;
use warnings;
use Data::Dumper;
use HTTP::Cookies::Netscape;
my $cookiesfilename='/home/xxx/xxx/cookies.txt';
my $out;
my $mech = WWW::Mechanize->new( cookie_jar => HTTP::Cookies::Netscape->new( file => $cookiesfilename ) );
$mech->get("https://www.facebook.com/login.php");
my $response=$mech->submit_form(
fields => {
email => 'xxxx#xxxx.com',
pass => 'xxxxx',
}
);
#my $array=$mech->forms();
#$mech->get('/home.php');
print Dumper($mech->forms());
#$mech->form_id("u_0_w");
$mech->submit_form(
fields => {
xhpc_message_text=>'Why so serious'
}
);
print $response->status_line;
open($out, ">", "output_page.html") or die "Can't open output_page.html: $!";
print $out $response->decoded_content;
Then I tried to print all the forms on the page using Dumper the output is:
$VAR1 = bless( {
'default_charset' => 'UTF-8',
'enctype' => 'application/x-www-form-urlencoded',
'accept_charset' => 'UNKNOWN',
'action' => bless( do{\(my $o = 'https://www.facebook.com/search/web/direct_search.php')}, 'URI::https' ),
'method' => 'GET',
'attr' => {
'method' => 'get'
},
'inputs' => [
bless( {
'tabindex' => '-1',
'value' => '1',
'class' => '_42ft _42fu _4w98',
'type' => 'submit'
}, 'HTML::Form::SubmitInput' ),
bless( {
'/' => '/',
'autocomplete' => 'off',
'tabindex' => '1',
'name' => 'q',
'aria-label' => 'Search Facebook',
'value_name' => '',
'class' => 'inputtext _586f',
'type' => 'text',
'id' => 'u_0_b',
'role' => 'combobox',
'placeholder' => 'Search Facebook'
}, 'HTML::Form::TextInput' )
]
}, 'HTML::Form' );
It means it is not detecting the status update form it is detecting only Facebook search form.
What may be the problem for mechanize not detecting all the form elements?
The form contains <button type="submit">. Do Mechanize support it?
Why do you have to use Mechanize for this? There's already a module available for this on CPAN.
Take a look at WWW::Facebook::API.
Also see a related question: How do I use Perl's WWW::Facebook::API to publish to a user's newsfeed?
Synopsis:
use WWW::Facebook::API;
my $facebook = WWW::Facebook::API->new(
desktop => 0,
api_key => $fb_api_key,
secret => $fb_secret,
session_key => $query->cookie($fb_api_key.'_session_key'),
session_expires => $query->cookie($fb_api_key.'_expires'),
session_uid => $query->cookie($fb_api_key.'_user')
);
my $response = $facebook->stream->publish(
message => qq|Test status message|,
);

How to access individual elements of perl Hashed object?

I'm a non-programmer attemting to retrieve useful info from our InfoBlox DHCP boxes. I've installed the Perl API and can make some use of it.
I've got an output from the Data::Dumper "thingie" that appears to have some of the info I want. I'd like to directly reference some of that data but I'm unsure how.
print Dumper(\$object)
Here is part of the Data::Dumper output;
$VAR1 = \bless( {
'network' => '10.183.1.0/24',
'override_lease_scavenge_time' => 'false',
'enable_ifmap_publishing' => 'false',
'low_water_mark_reset' => '10',
'use_lease_time' => 0,
'use_enable_option81' => 0,
'network_container' => '/',
'override_ddns_ttl' => 'false',
'rir' => 'NONE',
'network_view' => bless( {
<snip> --------------------------------------
'extattrs' => {
'Use' => bless( {
'value' => 'Voip'
}, 'Infoblox::Grid::Extattr' )
},
<snip> --------------------------------------
'members' => [
bless( {
'ipv4addr' => '10.85.9.242',
'name' => 'ig3-app3.my.net'
}, 'Infoblox::DHCP::Member' ),
bless( {
'ipv4addr' => '10.85.9.210',
'name' => 'ig3-app1.my.net'
}, 'Infoblox::DHCP::Member' ),
bless( {
'ipv4addr' => '10.85.9.226',
'name' => 'ig3-app2.my.net'
}, 'Infoblox::DHCP::Member' )
],
'override_ignore_client_identifier' => 'false',
'email_list' => undef,
'rir_registration_status' => '??
}, 'Infoblox::DHCP::Network' );
How do I view the elements? ie ...
print $object{members->name};
print $object{members->ipv4addr};
print $object{extattrs->Use->value};
I've found the API dox insufficiant for my skill level:) The data I'd like to pull remains just out of reach.
my #retrieved_objs = $session->search (
object => "Infoblox::DHCP::Network",
network => '.*\.*\.*\..*',
);
foreach $object ( #retrieved_objs ) {
my $network = $object->network;
my $comment = $object->comment;
my $extattrs = $object->extattrs;
my $options = $object->options;
print $network, " network ", $comment, " ", $extattrs, " ", $options, "\n";
}
-------- output ---
10.183.2.0/24 network HASH(0x6a2f038) ARRAY(0x1d20eb0)
10.192.1.0/24 network HASH(0x9df6540) ARRAY(0x9df5468)
10.192.2.0/24 network HASH(0xa088fc8) ARRAY(0xa089718)
You shouldn't try to access the internal values of an object directly. The module - in this case Infoblox::DHCP::Network will provide methods that allow you to read or manipulate the values properly.

trying to parse text and html from email over IMAP using MAIL::IMAPClient but text is hidden in parts and multi-parts

I can connect to the IMAP mail server easy enough:
use Mail::IMAPClient;
use MIME::Base64;
use MIME::Parser;
my $imap = Mail::IMAPClient->new(
Server => '192.168.2.2',
User => 'xxxxxx',
Password => 'yyyyyy',
Ssl => 1,
Uid => 1,
);
my $folders = $imap->folders
or die "List folders error: ", $imap->LastError, "\n";
print "Folders: #$folders\n";
$sfolder="INBOX.2012";
$imap->select( $sfolder )
or die "Select '$Opt{sfolder}' error: ", $imap->LastError, "\n";
my #msgs = $imap->messages or die "Could not messages: $#\n";
However, the text and html I want is not easily parsed due to codes like this:
Content-Transfer-Encoding:base64
Content-Type:text/html; charset=utf-8
Content-Transfer-Encoding:base64
Content-Type:text/html; charset=utf-8
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----------=_4F0F4830.7079357A"
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----=_Part_4487195_1184536749.1326753403034"
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/alternative; boundary=--boundary_164442_d184e417-739f-
46d6-824a-6ea1846e79de
Multipart
Content-Transfer-Encoding:
Content-Type:multipart/mixed; boundary="----=_Part_3882878_23916831.1326509484032"
Multipart
Content-Transfer-Encoding:
I tried this but it only works on a tiny number of different encodings.
if ($imap->get_header($msg,"Content-Transfer-Encoding")=~ /base64/i) {
print "\nMatch base64";
if ($imap->get_header($msg,"Content-Type")=~m/text/i ) {
push(#mail,decode_base64($imap->body_string($msg)));
}
elsif ($imap->get_header($msg,"Content-Type")=~m/image/i )
{ print "\nImage detected"; }
elsif ($imap->get_header($msg,"Content-Type")=~m/application/i )
{ print "\nApplication detected"; }
There are 7bit and 8bit variants and other encoding methods that contain the html or text I want for later use. I successfully use decode_base64() to decode base64. The worse ones to decode are the ones that contain multi-part codes. I feel like I am re-inventing the wheel and there must be a library or module that can do all the heavy lifting for me.
Other content types such as .jpg,.gif, and .pdf should simply be ignored. The multi-part emails contain at least 1 part that I an interested but many that are useless to me.
After further research this structure has some of the information I need but don't know how to get it out efficiently is another matter.
Dumping:$VAR1 = bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_002_BC64_7D688C1F.A2FF9BE0'
},
'bodyextra' => undef,
'_top' => 1,
'bodydisp' => 'NIL',
'_id' => 'HEAD',
'bodysubtype' => 'mixed',
'PartsIndex' => {
'1.3' => bless( {
'bodyparms' => 'NIL',
'bodyid' => '<d9e26cc0-019c-4ac0-9b1e-9c9ac8424f52>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.3',
'bodysubtype' => 'jpeg',
'_prefix' => '1.3',
'bodysize' => '4808',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.1' => bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_000_36AE_880DDD08.0A776E35'
},
'bodyextra' => undef,
'bodydisp' => 'NIL',
'_id' => '1.1',
'bodysubtype' => 'alternative',
'_prefix' => '1.1',
'bodytype' => 'MULTIPART',
'bodystructure' => [
bless( {
'bodyparms' => {
'charset' => 'utf-8'
},
'bodyextra' => 'NIL',
'bodyid' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.1.1',
'bodysubtype' => 'PLAIN',
'_prefix' => '1.1.1',
'bodysize' => '1971',
'bodytype' => 'TEXT',
'bodyMD5' => 'NIL',
'textlines' => '74',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'quoted-printable'
}, 'Mail::IMAPClient::BodyStructure' ),
bless( {
'bodyparms' => {
'charset' => 'utf-8'
},
'bodyextra' => 'NIL',
'bodyid' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.1.2',
'bodysubtype' => 'HTML',
'_prefix' => '1.1.2',
'bodysize' => '23364',
'bodytype' => 'TEXT',
'bodyMD5' => 'NIL',
'textlines' => '331',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'quoted-printable'
}, 'Mail::IMAPClient::BodyStructure' )
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' ),
'1' => bless( {
'bodyparms' => {
'boundary' => '----=_NextPart_001_EA96_2BF8DEDE.32622D51'
},
'bodyextra' => undef,
'bodydisp' => 'NIL',
'_id' => 1,
'bodysubtype' => 'related',
'_prefix' => 1,
'bodytype' => 'MULTIPART',
'bodystructure' => [
$VAR1->{'PartsIndex'}{'1.1'},
bless( {
'bodyparms' => 'NIL',
'bodyid' => '<5dff39db-e81c-4410-be75-8662564fd328>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.2',
'bodysubtype' => 'jpeg',
'_prefix' => '1.2',
'bodysize' => '14406',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
$VAR1->{'PartsIndex'}{'1.3'},
bless( {
'bodyparms' => 'NIL',
'bodyid' => '<717f2ef4-f795-4d1c-87cc-283c9b0a59b0>',
'bodyextra' => 'NIL',
'bodydisp' => 'NIL',
'_id' => '1.4',
'bodysubtype' => 'gif',
'_prefix' => '1.4',
'bodysize' => '2912',
'bodytype' => 'image',
'bodyMD5' => 'NIL',
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' )
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.2' => $VAR1->{'PartsIndex'}{'1'}{'bodystructure'}[1],
'1.1.2' => $VAR1->{'PartsIndex'}{'1.1'}{'bodystructure'}[1],
'2' => bless( {
'bodyparms' => {
'name' => 'BKD-7361945220.pdf'
},
'bodyid' => 'NIL',
'bodyextra' => 'NIL',
'bodydisp' => {
'attachment' => {
'filename' => 'BKD-7361945220.pdf'
}
},
'_id' => 2,
'bodysubtype' => 'octetstream',
'_prefix' => 2,
'bodysize' => '47540',
'bodytype' => 'application',
'bodyMD5' => 'NIL',
'bodystructure' => [],
'bodylang' => 'NIL',
'bodydesc' => 'NIL',
'bodyenc' => 'base64'
}, 'Mail::IMAPClient::BodyStructure' ),
'1.4' => $VAR1->{'PartsIndex'}{'1'}{'bodystructure'}[3],
'1.1.1' => $VAR1->{'PartsIndex'}{'1.1'}{'bodystructure'}[0]
},
'_prefix' => 'HEAD',
'PartsList' => [
1,
'1.1',
'1.1.1',
'1.1.2',
'1.2',
'1.3',
'1.4',
2
],
'bodytype' => 'MULTIPART',
'bodystructure' => [
$VAR1->{'PartsIndex'}{'1'},
$VAR1->{'PartsIndex'}{'2'}
],
'bodyloc' => 'NIL',
'bodylang' => 'NIL'
}, 'Mail::IMAPClient::BodyStructure' );
As you can see none of the values are guaranteed to be part of every part on the PartsIndex and some them are nested.
variable of interest for each PartsIndex item:
bodytype
bodysubtype
bodyenc
Parse mail messages with Courriel:
use strictures;
use Mail::IMAPClient qw();
use Courriel qw();
sub walk_parts {
my ($obj, $callback) = #_;
if ($obj->is_multipart) {
for my $part ($obj->parts) {
walk_parts($part, $callback);
}
} else {
$callback->($obj);
}
}
my $imap = Mail::IMAPClient->new(
…
) or die $#;
my $folders = $imap->folders
or die $imap->LastError;
$imap->select('INBOX')
or die $imap->LastError;
my #messages = $imap->messages
or die $imap->LastError;
for my $id (#messages) {
my $raw = $imap->message_string($id)
or die $imap->LastError;
my $email = Courriel->parse(text => $raw);
walk_parts $email, sub {
my ($part) = #_;
my $content = $part->content;
my $type = $part->mime_type;
}
}
I tried using a couple of prebuilt modules but they had too many dependencies and was hard to work with. This solution adds no dependencies beyond the original. I also had issues with the dependencies for libMagic, see above, and I did not want anyone who uses my program to have to deal with that issue either.
You have to call decode twice once for the main parent, and again for each child. Since this $imap->get_bodystructure($msg); contains all the information you need why add dependencies where none are needed. It took many many hours to figure out how to decode it manually, but it was worth it.
You can add whatever decoders you want to the decode() subroutine. I only need to decode the text/html and base64 encoded versions there of. The IMAPClient functions give you a list of all parents and children so you don't have to go making a list by yourself. The tricky part is you can have any number of parent each with any number of children, but only the children contain useful data. The parents can be ignored, since many of their values are blank,undef, or 'NIL' (literally). In fact a vast number of variables have the value of 'NIL'. Even ones that the email client could have answered for the user like bodyMD5 and bodylang are USUALLY equal 'NIL'. Due to the overwhelming use of 'NIL' parsing and using other fields may prove futile. Depend on your imap server and the people you recieve email from you mileage may vary.
If you have further questions leave a comment.
use Mail::IMAPClient;
use MIME::Base64;
use MIME::Parser;
sub decode {
($process, $imap) =#_;
if ($process->bodytype eq "TEXT") {
print "\n Text SubType:".$process->bodysubtype;
if ($process->bodyenc eq "base64") {
return decode_base64($imap->bodypart_string($msg,$process->id));
}
elsif (index(" -7bit- -8bit- -quoted-printable- ",lc($process->bodyenc)) !=-1 ) {
return $imap->bodypart_string($msg,$process->id);
}
print "\n==========Insert new decoder here============";
print "\n".$imap->bodypart_string($msg,$process->id);
print "\n=================================================";
}
return "";
}
#insert your login code with credentials here
$imap->select( $sfolder )
or die "Select '$Opt{sfolder}' error: ", $imap->LastError, "\n";
my #msgs = $imap->messages or die "Could not messages: $#\n";
foreach $msg (#msgs) {
my $raw = $imap->message_string($msg)
or die $imap->LastError;
$struct = $imap->get_bodystructure($msg);
#MULTIPART is a container designation and does not contain anything useful by itself.
#However it will still process all of the children that have content
if ($struct->bodytype ne "MULTIPART") { print "\n BodyEnc:".$struct->bodyenc();}
$rDecode=decode($struct,$imap);
#do not insert blanks.
if ($rDecode ne "" && (length($rDecode)>2)) {push(#mail,$rDecode); }
foreach $dumpme ($struct->bodystructure()) {
if ($dumpme->bodytype() eq "MULTIPART") {next;}
$rDecode="";
$rDecode=decode($dumpme,$imap);
#do not insert blanks.
if (($rDecode ne "") && (length($rDecode)>2) ) {
push(#mail,$rDecode); }
}
}
You need a MIME parser. Unfortunately, even then, you will need some normalization of your own, because there are multiple ways to represent the same data in MIME.