How to use output from WWW::Mechanize? - perl

I would like to loop through all links on a web page, so I have tried
#!/usr/bin/perl
use WWW::Mechanize;
my $url = "http://www.google.com";
my $m = WWW::Mechanize->new();
$m->get($url);
my #links = $m->find_all_links(url_regex => qr/google/);
foreach my $link (#links){
print Dumper $m->get($link->url_abs);
}
which gives me e.g.
$VAR11 = bless( [
'http://www.google.com/ncr',
'Google.com in English',
undef,
'a',
$VAR1->[4],
{
'href' => 'http://www.google.com/ncr',
'class' => 'gl nobr'
}
], 'WWW::Mechanize::Link' );
Question
How do I output just the links?

The documentation points out that the links are returned as WWW::Mechanize::Link objects. Therefore:
my #links = $m->find_all_links(url_regex => qr/google/);
print $_->url, "\n" for #links;

Related

mod_perl and CGI behavior

This has got to be something silly I'm doing wrong. It's such a newbie type problem.
The original script is something that sits and waits for a 3rd party to connect and POST some xml to it, it takes that xml, does some validation, and stores it in a db. That part is fine. The problem is my response. I'm trying to use the header() function from CGI and it's just not behaving. It comes up blank. Obviously I could just do this manually and just print the header string, but now I'm really curious why this is behaving so strangely.
Here is a stripped down test version of the cgi script:
use strict;
use warnings;
use Data::Dumper::Names;
use CGI qw(:standard);
use Apache2::Connection ();
use Apache2::RequestRec ();
$| = 1;
# Grab the request object provided by mod_perl.
our $request_obj = shift;
our $connection = $request_obj->connection;
our $remote_ip = $connection->client_ip();
my $cgi = CGI->new($request_obj->args());
print STDERR Dumper($cgi);
my $input = $cgi->param('POSTDATA');
print STDERR Dumper($input);
my $cgi_header = $cgi->header();
print STDERR Dumper($cgi_header);
my $cgi_full_header = $cgi->header(-type => 'application/xml');
print STDERR Dumper($cgi_full_header);
my $q = CGI->new({});
print STDERR Dumper($q);
my $q_header = $q->header();
print STDERR Dumper($q_header);
my $q_full_header = $q->header(-type => 'application/xml' );
print STDERR Dumper($q_full_header);
And the output:
$cgi = bless( {
'.r' => bless( do{\(my $o = '94118860562256')}, 'Apache2::RequestRec' ),
'param' => {
'POSTDATA' => [
'test'
],
'XForms:Model' => [
'test'
]
},
'use_tempfile' => 1,
'.fieldnames' => {},
'.charset' => 'ISO-8859-1',
'escape' => 1,
'.parameters' => [
'XForms:Model',
'POSTDATA'
]
}, 'CGI' );
$input = 'test';
$cgi_header = '';
$cgi_full_header = '';
$q = bless( {
'.parameters' => [
'XForms:Model',
'POSTDATA'
],
'escape' => 1,
'.fieldnames' => {},
'.charset' => 'ISO-8859-1',
'use_tempfile' => 1,
'.r' => bless( do{\(my $o = '94118860562256')}, 'Apache2::RequestRec' ),
'param' => {
'POSTDATA' => [
''
],
'XForms:Model' => [
''
]
}
}, 'CGI' );
$q_header = '';
$q_full_header = '';
And here is the simple test script I'm using to send the POST.
#!/perl/bin/perl
use strict;
use warnings;
use DBI;
use URI;
use LWP::UserAgent;
use Data::Dumper::Names;
my $ua = LWP::UserAgent->new;
$ua->max_size( 131072 );
$ua->agent('test_xml_pusher');
$ua->ssl_opts(verify_hostname => 0);
my $url = URI->new;
$url->scheme('https');
$url->host('xxxxxxxxxxxxxxxxxxxxxxxxx');
$url->port(443);
$url->path_segments('test.cgi');
# Yes, I know... it's not valid xml... don't care for the purposes of this test.
#
my $xml = 'test';
my $response = $ua->post( $url, Content => $xml, 'Content-Type' => 'application/xml' );
print Dumper($response);
my $status_line = $response->status_line;
print Dumper($status_line);
my $content = $response->content;
print Dumper($content);
So why is $cgi_header empty? And why does $q end up being a reference to the same thing as $cgi even though I tried initializing it as my $q = CGI->new({});? (I also tried empty quotes instead of empty brackets.)
Any thoughts?
Thanks!
My environment is a centos 7 server running apache httpd 2.4.34 with mod_perl 2.0.11 and perl 5.22.4. (httpd is installed from from SCL, but perl and mod_perl are installed from source.)
--
Andy

How to convert query string in hash in perl

I have a query string like this:
id=60087888;jid=16471827;from=advance;action=apply
or it can be like this :
id=60087888&jid=16471827&from=advance&action=apply
Now from this i want to create a hash that will have key as id and its value
I have done this
my %in;
$buffer = 'resid=60087888;jobid=16471827;from=advance;action=apply';
#pairs = split(/=/, $buffer);
foreach $pair (#pairs){
($name, $value) = split(/=/, $pair);
$in{$name} = $value;
}
print %in;
But the issue is in the query string it can be semin colon or & so how can we do this please help me
Don't try to solve it with new code; this is what CPAN modules are for. Specifically in this case, URI::Query
use URI::Query;
use Data::Dumper;
my $q = URI::Query->new( "resid=60087888;jobid=16471827;from=advance;action=apply" );
my %hash = $q->hash;
print Dumper( \%hash );
Gives
{ action => 'apply',
from => 'advance',
jobid => '16471827',
resid => '60087888' }
You've already an answer that works - but personally I might tackle it like this:
my %in = $buffer =~ m/(\w+)=(\w+)/g;
What this does is use regular expressions to pattern match either side of the equals sign.
It does so in pairs - effectively - and as a result is treated by a sequence of key-values in the hash assignment.
Note - it does assume you've not got special characters in your keys/values, and that you have no null values. (Or if you do, they'll be ignored - you can use (\w*) instead if that's the case).
But you get:
$VAR1 = {
'from' => 'advance',
'jid' => '16471827',
'action' => 'apply',
'id' => '60087888'
};
Alternatively:
my %in = map { split /=/ } split ( /[^=\w]/, $buffer );
We split using 'anything that isn't word or equals' to get a sequence, and then split on equals to make the same key-value pairs. Again - certain assumptions are made about valid delimiter/non-delimiter characters.
Check this answer:
my %in;
$buffer = 'resid=60087888;jobid=16471827;from=advance;action=apply';
#pairs = split(/[&,;]/, $buffer);
foreach $pair (#pairs){
($name, $value) = split(/=/, $pair);
$in{$name} = $value;
}
delete $in{resid};
print keys %in;
I know I'm late to the game, but....
#!/usr/bin/perl
use strict;
use CGI;
use Data::Dumper;
my $query = 'id=60087888&jid=16471827&from=advance&action=apply&blank=&not_blank=1';
my $cgi = CGI->new($query);
my %hash = $cgi->Vars();
print Dumper \%hash;
will produce:
$VAR1 = {
'not_blank' => '1',
'jid' => '16471827',
'from' => 'advance',
'blank' => '',
'action' => 'apply',
'id' => '60087888'
};
Which has the added benefit of dealing with keys that might not have values in the source string.
Some of the other examples will produce:
$VAR1 = {
'id' => '60087888',
'1' => undef,
'jid' => '16471827',
'from' => 'advance',
'blank' => 'not_blank',
'action' => 'apply'
};
which may not be desirable.
I would have used URI::Query #LeoNerd 's answer, but I didn't have the ability to install a module in my case and CGI.pm was handy.
also, you could
my $buffer = 'id=60087888&jid=16471827&from=advance&action=apply';
my %hash = split(/&|=/, $buffer);
which gives:
$hash = {
'jid' => '16471827',
'from' => 'advance',
'action' => 'apply',
'id' => '60087888'
};
This is VERY fragile, so I wouldn't advocate using it.

perl retrieving page details after mechanize::POST

I am trying to gather data from a website. Some anti-patterns make looking finding the right form objects difficult but I have this solved. I am using a post method to get around some javascript acting as a wrapper to submit the form. My problem seems to be in getting the results from the mechanize->post method.
Here's a shortened version of my code.
use strict;
use warnings;
use HTML::Tree;
use LWP::Simple;
use WWW::Mechanize;
use HTTP::Request::Common;
use Data::Dumper;
$| = 1;
my $site_url = "http://someURL";
my $mech = WWW::Mechanize->new( autocheck => 1 );
foreach my $number (#numbers)
{
my $content = get($site_url);
$mech->get ($site_url);
my $tree = HTML::Tree->new();
$tree->parse($content);
my ($title) = $tree->look_down( '_tag' , 'a' );
my $atag = "";
my $atag1 = "";
foreach $atag ( $tree->look_down( _tag => q{a}, 'class' => 'button', 'title' => 'SEARCH' ) )
{
print "Tag is ", $atag->attr('id'), "\n";
$atag1 = Dumper $atag->attr('id');
}
# Enter permit number in "Number" search field
my #forms = $mech->forms;
my #fields = ();
foreach my $form (#forms)
{
#fields = $form->param;
}
my ($name, $fnumber) = $fields[2];
print "field name and number is $name\n";
$mech->field( $name, $number, $fnumber );
print "field $name populated with search data $number\n" if $mech->success();
$mech->post($site_url ,
[
'$atag1' => $number,
'internal.wdk.wdkCommand' => $atag1,
]) ;
print $mech->content; # I think this is where the problem is.
}
The data I get from my final print statement is the data from teh original URL not the page the POST command should take me to. What have I done wrong?
Many Thanks
Update
I don't have Firefox installed so I'm avoiding WWW::Mechanize::Firefox intentionally.
Turns out I was excluding some required hidden fields from my POST command.

WWW::Mechanize gives corrupted uploaded file name

I have some weird problem while uploading a file with a Cyrillic name using WWW::Mechanize. The file is uploaded correctly but the name is broken (I see only ?????? on the target site).
The code is simple:
use WWW::Mechanize;
use Encode qw(from_to);
my $config = {
login => "login",
password => "pass",
source_folder => "$Bin/source_folder",
};
my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->agent_alias("Windows IE 6");
$mech->get("http://www.antiplagiat.ru/Cabinet/Cabinet.aspx?folderId=689935");
authorize($mech);
$mech->submit_form(
form_number => 1,
fields => {},
button =>
'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$FolderControl_StdFolder_0$DocumentsGrid$btnAddItem',
);
find( \&wanted, $config->{source_folder} );
sub wanted {
return unless -f;
say $config->{source_folder} . "/" . $_;
#from_to($_, "CP1251", "UTF8"); doesn't work too :-(
my $mech = $mech->clone();
$mech->submit_form(
form_number => 1,
fields => {
'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$fuDocumentUpload' =>
$config->{source_folder} . "/" . $_,
},
button => 'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$btnCommitUpload',
);
}
If I encode the file name from CP1251 to UTF8 then the upload doesn't work. Please help me to find a solution.
Here is solution I use:
my $filename = $_;
from_to( $filename, "CP1251", "UTF8" );
my $mech = $mech->clone();
my $form = $mech->form_number(1);
$mech->field( 'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$fuDocumentUpload',
$config->{source_folder} . "/" . $_ );
$form->find_input(
'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$fuDocumentUpload')->filename($filename);
$mech->submit_form(
form_number => 1,
button => 'ctl00$ctl00$Body$MainWorkSpacePlaceHolder$btnCommitUpload',
);

Why does WWW::Mechanize and login-data break when I switch from a query string to a hash?

The following script works fine:
#!/usr/bin/env perl
use strict; use warnings;
use Data::Dumper;
use WWW::Mechanize;
my $loginData = "userName=username&password=password&deeplinkForward=%2Fselfcare%2Frestricted%2FprepareCoCo.do&x=84&y=7";
my $loginUrl = "https://www.login.login/login.do";
my $mech = WWW::Mechanize->new( show_progress => 1 );
my $req = $mech->post( $loginUrl, 'Content' => $loginData );
my $content = $req->content();
print Dumper $content;
But when I replace the line
my $req = $mech->post( $loginUrl, 'Content' => $loginData );
with
my %hash = (
'username' => 'username',
'password' => 'password',
'deeplinkForward' => '%2Fselfcare%2Frestricted%2FprepareCoCo.do',
'x' => '84',
'y' => '7'
);
my $req = $mech->post( $loginUrl, 'Content' => \%hash );
it doesn't work any more ( the script works, but the login doesn't ). Is there something worng?
You have to unescape deeplinkForward:
'deeplinkForward' => '/selfcare/restricted/prepareCoCo.do',
Otherwise, WWW::Mechanize thinks you want to send literal % signs, and helpfully escapes them for you.
To see what's going wrong, try adding this code right before the $mech->post line:
use HTTP::Request::Common 'POST';
print POST( $loginUrl, 'Content' => $loginData )->as_string;
print POST( $loginUrl, 'Content' => \%hash )->as_string;
They should be the same, except for the order of the fields.
It's conceivable that the server requires the fields to be listed in that order (it shouldn't, but...). In that case, you can use an array instead of a hash (hashes don't preserve ordering). Just replace %hash with #fields everywhere it appears.
print POST( $loginUrl, 'Content' => \#fields )->as_string;
i don't have mechanize in place, but you can try this and see how it goes
my $req = $mech->post( $loginUrl, \%hash);