WWW::Mechanize gives corrupted uploaded file name - perl

I have some weird problem while uploading a file with a Cyrillic name using WWW::Mechanize. The file is uploaded correctly but the name is broken (I see only ?????? on the target site).
The code is simple:
use WWW::Mechanize;
use Encode qw(from_to);
my $config = {
login => "login",
password => "pass",
source_folder => "$Bin/source_folder",
};
my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->agent_alias("Windows IE 6");
$mech->get("http://www.antiplagiat.ru/Cabinet/Cabinet.aspx?folderId=689935");
authorize($mech);
$mech->submit_form(
form_number => 1,
fields => {},
button =>
'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$FolderControl_StdFolder_0$DocumentsGrid$btnAddItem',
);
find( \&wanted, $config->{source_folder} );
sub wanted {
return unless -f;
say $config->{source_folder} . "/" . $_;
#from_to($_, "CP1251", "UTF8"); doesn't work too :-(
my $mech = $mech->clone();
$mech->submit_form(
form_number => 1,
fields => {
'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$fuDocumentUpload' =>
$config->{source_folder} . "/" . $_,
},
button => 'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$btnCommitUpload',
);
}
If I encode the file name from CP1251 to UTF8 then the upload doesn't work. Please help me to find a solution.

Here is solution I use:
my $filename = $_;
from_to( $filename, "CP1251", "UTF8" );
my $mech = $mech->clone();
my $form = $mech->form_number(1);
$mech->field( 'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$fuDocumentUpload',
$config->{source_folder} . "/" . $_ );
$form->find_input(
'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$fuDocumentUpload')->filename($filename);
$mech->submit_form(
form_number => 1,
button => 'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$btnCommitUpload',
);

Related

mod_perl and CGI behavior

This has got to be something silly I'm doing wrong. It's such a newbie type problem.
The original script is something that sits and waits for a 3rd party to connect and POST some xml to it, it takes that xml, does some validation, and stores it in a db. That part is fine. The problem is my response. I'm trying to use the header() function from CGI and it's just not behaving. It comes up blank. Obviously I could just do this manually and just print the header string, but now I'm really curious why this is behaving so strangely.
Here is a stripped down test version of the cgi script:
use strict;
use warnings;
use Data::Dumper::Names;
use CGI qw(:standard);
use Apache2::Connection ();
use Apache2::RequestRec ();
$| = 1;
# Grab the request object provided by mod_perl.
our $request_obj = shift;
our $connection = $request_obj->connection;
our $remote_ip = $connection->client_ip();
my $cgi = CGI->new($request_obj->args());
print STDERR Dumper($cgi);
my $input = $cgi->param('POSTDATA');
print STDERR Dumper($input);
my $cgi_header = $cgi->header();
print STDERR Dumper($cgi_header);
my $cgi_full_header = $cgi->header(-type => 'application/xml');
print STDERR Dumper($cgi_full_header);
my $q = CGI->new({});
print STDERR Dumper($q);
my $q_header = $q->header();
print STDERR Dumper($q_header);
my $q_full_header = $q->header(-type => 'application/xml' );
print STDERR Dumper($q_full_header);
And the output:
$cgi = bless( {
'.r' => bless( do{\(my $o = '94118860562256')}, 'Apache2::RequestRec' ),
'param' => {
'POSTDATA' => [
'test'
],
'XForms:Model' => [
'test'
]
},
'use_tempfile' => 1,
'.fieldnames' => {},
'.charset' => 'ISO-8859-1',
'escape' => 1,
'.parameters' => [
'XForms:Model',
'POSTDATA'
]
}, 'CGI' );
$input = 'test';
$cgi_header = '';
$cgi_full_header = '';
$q = bless( {
'.parameters' => [
'XForms:Model',
'POSTDATA'
],
'escape' => 1,
'.fieldnames' => {},
'.charset' => 'ISO-8859-1',
'use_tempfile' => 1,
'.r' => bless( do{\(my $o = '94118860562256')}, 'Apache2::RequestRec' ),
'param' => {
'POSTDATA' => [
''
],
'XForms:Model' => [
''
]
}
}, 'CGI' );
$q_header = '';
$q_full_header = '';
And here is the simple test script I'm using to send the POST.
#!/perl/bin/perl
use strict;
use warnings;
use DBI;
use URI;
use LWP::UserAgent;
use Data::Dumper::Names;
my $ua = LWP::UserAgent->new;
$ua->max_size( 131072 );
$ua->agent('test_xml_pusher');
$ua->ssl_opts(verify_hostname => 0);
my $url = URI->new;
$url->scheme('https');
$url->host('xxxxxxxxxxxxxxxxxxxxxxxxx');
$url->port(443);
$url->path_segments('test.cgi');
# Yes, I know... it's not valid xml... don't care for the purposes of this test.
#
my $xml = 'test';
my $response = $ua->post( $url, Content => $xml, 'Content-Type' => 'application/xml' );
print Dumper($response);
my $status_line = $response->status_line;
print Dumper($status_line);
my $content = $response->content;
print Dumper($content);
So why is $cgi_header empty? And why does $q end up being a reference to the same thing as $cgi even though I tried initializing it as my $q = CGI->new({});? (I also tried empty quotes instead of empty brackets.)
Any thoughts?
Thanks!
My environment is a centos 7 server running apache httpd 2.4.34 with mod_perl 2.0.11 and perl 5.22.4. (httpd is installed from from SCL, but perl and mod_perl are installed from source.)
--
Andy

Perl Log::Dispatch: Change logging location mid-run?

I'm using Log::Dispatch in a large Mooseified app, via MooseX::LogDispatch. My setup is more or less:
use Moose;
with 'MooseX::LogDispatch';
has log_dispatch_conf => (
is => 'ro',
isa => 'HashRef',
lazy => 1,
required => 1,
default => sub {
my $self = shift;
return {
class => 'Log::Dispatch::File',
min_level => 'debug',
filename => $self->config->{logfile},
mode => '>>',
newline => 1
};
},
);
Is there any way I can change the location of the log file in the middle of a running process? My specific use case is that I'm processing a number of different large [things], which are passed in at runtime, and I'd like each [thing] to log to its own file. Something like this:
foreach my $thing (#things) {
my $logfile = $self->config->{log_base} . $thing->{name} . time() . ".log";
# do something here to set log location
$self->logger->info("Start processing " . $thing->{name} . " at " . scalar localtime());
# process $thing
}
Right. I abandoned MooseX::LogDispatch and did it myself.
When we have a new [thing], I just call a trigger to fire a _set_logger method:
sub _set_logger {
my ($self, $thing) = #_;
my $log_dir = $self->config->{log_dir}; # /path/to/log_dir
my $log_file_base = $self->config->{log_file_base}; # e.g. process-thing-log
my $t = localtime;
my $logfile = $log_dir . "/" . $log_file_base . $thing->{name} . "-" . $t->ymd . ".log";
my $logger = Log::Dispatch->new(
outputs => [
[ 'File',
min_level => 'debug',
filename => $logfile,
mode => '>>',
newline => 1,
],
],
);
$self->logger($logger);
$self->logger->info("Started run at " . scalar localtime);
}
Don't know if it's "right", but it's working smoothly.

How to use output from WWW::Mechanize?

I would like to loop through all links on a web page, so I have tried
#!/usr/bin/perl
use WWW::Mechanize;
my $url = "http://www.google.com";
my $m = WWW::Mechanize->new();
$m->get($url);
my #links = $m->find_all_links(url_regex => qr/google/);
foreach my $link (#links){
print Dumper $m->get($link->url_abs);
}
which gives me e.g.
$VAR11 = bless( [
'http://www.google.com/ncr',
'Google.com in English',
undef,
'a',
$VAR1->[4],
{
'href' => 'http://www.google.com/ncr',
'class' => 'gl nobr'
}
], 'WWW::Mechanize::Link' );
Question
How do I output just the links?
The documentation points out that the links are returned as WWW::Mechanize::Link objects. Therefore:
my #links = $m->find_all_links(url_regex => qr/google/);
print $_->url, "\n" for #links;

How do I work with just one key and value from Data::Dumper output

I have data dumper outputting a remotely hosted xml file into a local text file and I am getting the following info:
$VAR1 = {
'resource' => {
'005cd410-41d6-4e3a-a55f-c38732b73a24.xml' => {
'standard' => 'DITA',
'area' => 'holding',
'id' => 'Comp_UKCLRONLINE_UKCLR_2000UKCLR0278',
},
'003c2a5e-4af3-4e70-bf8b-382d0b4edda1.xml' => {
'standard' => 'DITA',
'area' => 'holding',
'id' => 'Comp_UKCLRONLINE_UKCLR_2000UKCLR0278',
},
etc. What I want to do is work with just one/key and value in each resource. Ie pick out the ID and then create a url from that.
I would normally use a regex on the file and pull the info I need from that but I'm thinking there must be an easier/proper way but can't think of the right term to use in a search and am therefore not finding it.
Here is the code I am using to write this output to a file:
#-----------------------------------------------
sub request_url {
#-----------------------------------------------
my $useragent = LWP::UserAgent->new;
my $request = HTTP::Request->new( GET => "http://digitalessence.net/resource.xml" );
$resource = $useragent->request( $request );
}
#-----------------------------------------------
sub file_write {
#-----------------------------------------------
open OUT, ">$OUT" or Log_message ("\n$DATE - $TIME - Could not create filelist.doc \t");
Log_message ("\n$DATE - $TIME - Opened the output file");
print OUT Dumper (XML::Simple->new()->XMLin( $resource->content ));
Log_message ("\n$DATE - $TIME - Written the output file");
}
thanks
I'm not really understanding your question, but I'm guessing you want to access some data from the hash.
You don't need a regex or other strage stuff; just `do` your data and get the value from the hassref you get back:
A simple one liner as an example (assuming your file is called `dumper.out`):
perl -Mstrict -wE 'my $hashref = do{ do "dumper.out" }; say $hashref->{resource}{"005cd410-41d6-4e3a-a55f-c38732b73a24.xml"}{id}'
HTH, Paul
Maybe you want to walk the data structure built by XML::Simple.
Each resource is inside an ARRAYREF you get using the resource key with $doc data structure.
use XML::Simple;
use LWP;
use Data::Dumper;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new( GET => "http://digitalessence.net/resource.xml" );
my $res = $ua->request( $req );
my $xs = XML::Simple->new();
my $doc = $xs->XMLin( $res->content );
printf "resources: %s\n", scalar keys %{ $doc->{ resource } };
foreach ( keys %{ $doc->{ resource } } ) {
printf "resource => %s, id => %s\n", $_, $doc->{ resource }->{ $_ }->{ id };
}
The output is this:
resources: 7
resource => 005cd410-41d6-4e3a-a55f-c38732b73a24.xml, id => Comp_UKCLRONLINE_UKCLR_2000UKCLR0278
resource => 003c2a5e-4af3-4e70-bf8b-382d0b4edda1.xml, id => Comp_UKCLRONLINE_UKCLR_2002UKCLR0059
resource => 0033d4d3-c397-471f-8cf5-16fb588b0951.xml, id => Comp_UKCLRONLINE_UKCLR_navParentTopic_67
resource => 002a770a-db47-41ef-a8bb-0c8aa45a8de5.xml, id => Comp_UKCLRONLINE_UKCLR_navParentTopic_308
resource => 000fff79-45b8-4ac3-8a57-def971790f16.xml, id => Comp_UKCLRONLINE_UKCLR_2002UKCLR0502
resource => 00493372-c090-4734-9a50-8f5a06489591.xml, id => Comp_UKCLRONLINE_COMPCS_2010_10_0002
resource => 004377bf-8e24-4a69-9411-7c6baca80b87.xml, id => Comp_CLJONLINE_CLJ_2002_01_11

Why does WWW::Mechanize and login-data break when I switch from a query string to a hash?

The following script works fine:
#!/usr/bin/env perl
use strict; use warnings;
use Data::Dumper;
use WWW::Mechanize;
my $loginData = "userName=username&password=password&deeplinkForward=%2Fselfcare%2Frestricted%2FprepareCoCo.do&x=84&y=7";
my $loginUrl = "https://www.login.login/login.do";
my $mech = WWW::Mechanize->new( show_progress => 1 );
my $req = $mech->post( $loginUrl, 'Content' => $loginData );
my $content = $req->content();
print Dumper $content;
But when I replace the line
my $req = $mech->post( $loginUrl, 'Content' => $loginData );
with
my %hash = (
'username' => 'username',
'password' => 'password',
'deeplinkForward' => '%2Fselfcare%2Frestricted%2FprepareCoCo.do',
'x' => '84',
'y' => '7'
);
my $req = $mech->post( $loginUrl, 'Content' => \%hash );
it doesn't work any more ( the script works, but the login doesn't ). Is there something worng?
You have to unescape deeplinkForward:
'deeplinkForward' => '/selfcare/restricted/prepareCoCo.do',
Otherwise, WWW::Mechanize thinks you want to send literal % signs, and helpfully escapes them for you.
To see what's going wrong, try adding this code right before the $mech->post line:
use HTTP::Request::Common 'POST';
print POST( $loginUrl, 'Content' => $loginData )->as_string;
print POST( $loginUrl, 'Content' => \%hash )->as_string;
They should be the same, except for the order of the fields.
It's conceivable that the server requires the fields to be listed in that order (it shouldn't, but...). In that case, you can use an array instead of a hash (hashes don't preserve ordering). Just replace %hash with #fields everywhere it appears.
print POST( $loginUrl, 'Content' => \#fields )->as_string;
i don't have mechanize in place, but you can try this and see how it goes
my $req = $mech->post( $loginUrl, \%hash);