How to overwrite a function used in a module-method? - perl

#!/usr/bin/env perl
use warnings;
use 5.012;
use utf8;
use WWW::Mechanize::Cached;
use Some::Module qw(some_method);
my $url = '...';
my $result = some_method( $url );
The some_method() uses itself get() form LWP::Simple.
How could I overwrite the get() with my my_get() in this script?
sub my_get {
my $url;
my $mech = WWW::Mechanize::Cached->new();
$mech->get( $url );
my $content = $mech->content( format => 'text' );
return $content;
}

sub WWW::Mechanize::Cached::get {
# your code
}
OR, if the get method is actually, as you imply in the question, is inherited from LWP::Simple -
sub LWP::Simple::get {
# your code
}

Related

Running perl code using `atom` throws me Undefined subroutine &main::send_request

I'm writing my first perl script for the requirement
generate HTTP request against a particular web uri in succession using different URL scheme patterns
use HTTP::Request::Generator 'generate_requests';
use URI;
use HTTP::Request::Common;
use strict; # safety net
use warnings; # safety ne
use Test::LWP::UserAgent 'send_request';
use LWP::UserAgent 'send_request';
use Test::More;
use URI;
use HTTP::Request::Common;
use LWP::UserAgent;
my $g = generate_requests(
method => 'POST',
host => ['example.com','www.example.com'],
pattern => 'https://example.com/{bar,foo,gallery}/[00..99].html',
wrap => sub {
my( $req ) = #_;
# Fix up some values
$req->{headers}->{'Content-Length'} = 666;
},
);
while( my $r = $g->()) {
send_request( $r );
};
I'm using atom editor and activeperl on windows 10, I get following error from running above code.
Undefined subroutine &main::send_request called at C:\Users\ADMINI~1\AppData\Local\Temp\atom_script_tempfiles\0ac821e0-0886-11eb-9588-291dbc37d883 line 57.
I have already installed all necessary modules and lib but i think its unable to refer the method send_request. Pls assist.
NOTE
I have replaced real values in variable for privacy reasons.
UPDATE
I plan to use following module
pattern => 'https://example.{com,org,net}/page_[00..99].html', from
https://metacpan.org/pod/HTTP::Request::Generator.
LWP::UserAgent is an object-oriented module. It doesn't export functions. You want to call send_request like this:
my $ua = 'LWP::UserAgent'->new;
while ( my $r = $g->() ) {
$ua->send_request( $r );
}
That said, send_request is an undocumented internal method. I think it is probably more intended for people who are subclassing LWP::UserAgent. You probably want the request method instead.
my $ua = 'LWP::UserAgent'->new;
while ( my $r = $g->() ) {
my $response = $ua->request( $r );
}
Full code:
use strict;
use warnings;
use HTTP::Request::Generator 'generate_requests';
use LWP::UserAgent;
my $ua = 'LWP::UserAgent'->new;
my $gen = generate_requests(
method => 'POST',
host => [ 'example.com', 'www.example.com' ],
pattern => 'https://example.com/{bar,foo,gallery}/[00..99].html',
wrap => sub {
my ( $req ) = #_;
# Fix up some values
$req->{'headers'}{'Content-Length'} = 666;
},
);
while ( my $req = $gen->() ) {
my $response = $ua->request( $req );
# Do something with $response here?
}

Can I decode gzip when using the mirror function in LWP Useragent?

Can I use something like $response->decoded_content within a LWP UserAgent 'mirror' request? Thank you.
When using mirror() the received data is not added to the response object directly, but instead written directly to the mirror file. This means that decoded_content() will not work. However, you can add a response_header that enables the storage of the received data:
use strict;
use warnings;
use LWP::UserAgent ();
my $ua = LWP::UserAgent->new;
my $fn = 'libwww-perl-6.41.tar.gz'; # Example file..
my $url = 'https://cpan.metacpan.org/authors/id/O/OA/OALDERS/'. $fn;
$ua->add_handler(
response_header => sub {
my($response, $ua, $handler) = #_;
$response->{default_add_content} = 1;
}
);
my $response = $ua->mirror($url, $fn);
if ( $response->is_success ) {
if ( $response->header('Content-Type') eq 'application/x-gzip') {
$response->header('Content-Encoding' => 'gzip');
}
my $decoded_content = $response->decoded_content;
# Do someting with the decoded content here ...
}

Getting Absolute URLs with module creating object outside loop

I have a doubt I've been trying to solve myself using CPAN modules documentation, but I'm a bit new and I'm confused with some terminology and sections within the different modules.
I'm trying to create the object in the code below, and get the absolute URL for relative links extracted from a website.
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use Digest::MD5 qw(md5_hex);
use URI;
my $url = $ARGV[0];
if ($url !~ m{^https?://[^\W]+-?\.com/?}i) {
exit(0);
}
my $ua = LWP::UserAgent->new;
$ua->timeout( 10 );
my $response = $ua->get( $url );
my $content = $response->decoded_content();
my $links = URI->new($content);
my $abs = $links->abs('http:', $content);
my $abs_links = $links->abs($abs);
while ($content =~ m{<a[^>]\s*href\s*=\s*"?([^"\s>]+)}gis) {
$abs_links = $1;
print "$abs_links\n";
print "Digest for the above URL is " . md5_hex($abs_links) . "\n";
}
The problem is when I try to add that part outside the While loop (the 3-line block preceding the loop), it does not work, whereas if I add the same part in the While loop, it will work fine. This one just gets the relative URLs from a given website, but instead of printing "Http://..." it prints "//...".
The script that works fine for me is the following:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use Digest::MD5 qw(md5_hex);
use URI::URL;
my $url = $ARGV[0]; ## Url passed in command
if ($url !~ m{^https?://[\w]+-?[\w]+\.com/?}i) {
exit(0); ## Program stops if not valid URL
}
my $ua = LWP::UserAgent->new;
$ua->timeout( 10 );
my $response = $ua->get( $url ); ## Get response, not content
my $content = $response->decoded_content(); ## Now let's get the content
while ($content =~ m{<a[^>]\s*href\s*=\s*"?([^"\s>]+)}gis) { ## All links
my $links = $1;
my $abs = new URI::URL "$links";
my $abs_url = $abs->abs('http:', $links);
print "$abs_url\n";
print "Digest for the above URL is " . md5_hex($abs_url) . "\n";
}
Any ideas? Much appreciated.
I don't understand your code. There are a few weird bits:
[^\W] is the same as \w
The regex allows an optional - before and an optional / after .com, i.e. http://bitwise.complement.biz matches but http://cool-beans.com doesn't.
URI->new($content) makes no sense: $content is random HTML, not a URI.
$links->abs('http:', $content) makes no sense: $content is simply ignored, and $links->abs('http:') tries to make $links an absolute URL relative to 'http:', but 'http:' is not a valid URL.
Here's what I think you're trying to do:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
use HTML::LinkExtor;
use Digest::MD5 qw(md5_hex);
#ARGV == 1 or die "Usage: $0 URL\n";
my $url = $ARGV[0];
my $ua = LWP::UserAgent->new(timeout => 10);
my $response = $ua->get($url);
$response->is_success or die "$0: " . $response->request->uri . ": " . $response->status_line . "\n";
my $content = $response->decoded_content;
my $base = $response->base;
my #links;
my $p = HTML::LinkExtor->new(
sub {
my ($tag, %attrs) = #_;
if ($tag eq 'a' && $attrs{href}) {
push #links, "$attrs{href}"; # stringify
}
},
$base,
);
$p->parse($content);
$p->eof;
for my $link (#links) {
print "$link\n";
print "Digest for the above URL is " . md5_hex($link) . "\n";
}
I don't try to validate the URL passed in $ARGV[0]. Leave it to LWP::UserAgent. (If you don't like this, just add the check back in.)
I make sure $ua->get($url) was successful before proceeding.
I get the base URL for absolutifying relative links from $response->base.
I use HTML::LinkExtor for parsing the content, extracting links, and making them absolute.
I think your biggest mistake is trying to parse links out of HTML using a regular expression. You would be far better advised to use a CPAN module for this. I'd recommend WWW::Mechanize, which would make your code look something like this:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use WWW::Mechanize;
use Digest::MD5 qw(md5_hex);
use URI;
my $url = $ARGV[0];
if ($url !~ m{^https?://[^\W]+-?\.com/?}i) {
exit(0);
}
my $ua = WWW::Mechanize->new;
$ua->timeout( 10 );
$ua->get( $url );
foreach ($ua->links) {
say $_->url;
say "Digest for the above URL is " . md5_hex($_->url) . "\n";
}
That looks a lot simpler to me.

issues with CGI::Minimal

My existing code works fine with vanilla cgi
#!/usr/bin/env perl
use strict;
use CGI;
use CGI::Carp qw/fatalsToBrowser warningsToBrowser/;
use CGI::Session;
my ( $session, $cgi);
$cgi = new CGI();
$cgi->charset('UTF-8');
$session = new CGI::Session( "driver:File;serializer:Storable",
$cgi, { Directory => '../home/tmp' } );
$session->expire( 'authorized', '1440m' );
changing over to CGI::Minimal causes CGI::Session to error out
#!/usr/bin/env perl
use strict;
use CGI::Minimal;
use CGI::Carp qw/fatalsToBrowser warningsToBrowser/;
use CGI::Session;
my ( $session, $cgi);
my $cgi = CGI::Minimal->new;
$session = new CGI::Session( "driver:File;serializer:Storable",
$cgi, { Directory => '../web/tmp' } ) or or die CGI::Session->errstr();
$session->expire( 'authorized', '1440m' );
The error
Can't call method "expire" on an undefined value at /var/webserver/iris/htdocs/index.cgi line 13.
Edit: after or die CGI::Session->errstr() is added
new(): failed: query object CGI::Minimal=HASH(0x9916a64) does not support cookie() and param() methods:
Not really sure what causes this, really appreciate any insight.
Since CGI::Minimal is indeed rather minimal you'd have to handle Cookies yourself and pass the manually retrieved session_id, instead of the cgi object, to the CGI::Session constructor.
use warnings;
use strict;
use CGI::Cookie;
use CGI::Minimal;
use CGI::Carp qw/fatalsToBrowser warningsToBrowser/;
use CGI::Session;
my $cgi = CGI::Minimal->new();
my %cookies = CGI::Cookie->fetch;
my $session_id;
if( defined $cookies{'SESSION_ID'} ){
$session_id = $cookies{'SESSION_ID'}->value;
}
my $session = CGI::Session->new(
"driver:File;serializer:Storable",
$session_id,
{ Directory => '../web/tmp' }
) or die CGI::Session->errstr();
$session->expire( 'authorized', '1440m' );
my $session_cookie = CGI::Cookie->new(-name => 'SESSION_ID',-value => $session->id());
print $session->header(-cookie=>[$session_cookie]);
print 'ok with session_id: ' . $session->id();

how to get POST values in perl

I am trying to customize a script and need to get a POST value from a form using perl.
I have no background of perl but this is a fairly simple thing so I guess it should not be hard.
This is the php version of the code I would like to have in PERL:
<?php
$download = ($_POST['dl']) ? '1' : '0';
?>
I know this may not be at all related to the PERL version but it could help I guess clarifying what exactly I am looking to do.
Well, in that case please have a look at this simple code: This would help you:
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use CGI::Carp qw(fatalsToBrowser);
sub output_top($);
sub output_end($);
sub display_results($);
sub output_form($);
my $q = new CGI;
print $q->header();
# Output stylesheet, heading etc
output_top($q);
if ($q->param()) {
# Parameters are defined, therefore the form has been submitted
display_results($q);
} else {
# We're here for the first time, display the form
output_form($q);
}
# Output footer and end html
output_end($q);
exit 0;
# Outputs the start html tag, stylesheet and heading
sub output_top($) {
my ($q) = #_;
print $q->start_html(
-title => 'A Questionaire',
-bgcolor => 'white');
}
# Outputs a footer line and end html tags
sub output_end($) {
my ($q) = #_;
print $q->div("My Web Form");
print $q->end_html;
}
# Displays the results of the form
sub display_results($) {
my ($q) = #_;
my $username = $q->param('user_name');
}
# Outputs a web form
sub output_form($) {
my ($q) = #_;
print $q->start_form(
-name => 'main',
-method => 'POST',
);
print $q->start_table;
print $q->Tr(
$q->td('Name:'),
$q->td(
$q->textfield(-name => "user_name", -size => 50)
)
);
print $q->Tr(
$q->td($q->submit(-value => 'Submit')),
$q->td(' ')
);
print $q->end_table;
print $q->end_form;
}
Style advice: you almost never need to assign 0 or 1 to a variable. Simply evaluate the value itself in bool context.
In CGI.pm (CGI), the param method merges POST and GET parameters, so we need to inspect the request method separately:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use CGI qw();
my $c = CGI->new;
print $c->header('text/plain');
if ('POST' eq $c->request_method && $c->param('dl')) {
# yes, parameter exists
} else {
# no
}
print 'Do not taunt happy fun CGI.';
With Plack::Request (PSGI), you have different methods for POST (body_parameters) and GET (query_parameters) in addition to the mixed interface (parameters):
#!/usr/bin/env plackup
use strict;
use warnings FATAL => 'all';
use Plack::Request qw();
my $app = sub {
my ($env) = #_;
my $req = Plack::Request->new($env);
if ($req->body_parameters->get_all('dl')) {
# yes
} else {
# no
}
return [200, [Content_Type => 'text/plain'], ['Do not taunt happy fun Plack.']];
};
Here's a good place to start: The Fool's Guide to CGI.pm,
the Perl module for CGI scripting.
This will show you "...how to get the POST value (from a submitted form) and assign it to a variable."
Hope this helps!
The above examples are bit complicated. The below code reads POST values into a variable. You can extract Key Value from that. If its GET then its better to use CGI module.
#!/usr/bin/perl
my $FormData = '';
read(STDIN, $FormData, $ENV{'CONTENT_LENGTH'});
## Variable $FormData holds all POST values passed
use CGI;
my $cgi = new CGI;
print $cgi->header();
print "$FormData";