cloudflare protection is blocking the website connectivity in mojo::user-agent - perl - perl

I just want to validate the website link whether it is connecting or not. I added the website also in the code. Please show some light on this.
Here is my code:
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $timeout = $ua->request_timeout;
$ua = $ua->request_timeout(10);
my $res = $ua->get('https://www.aba.com')->result;
if ($res->is_success) { print 'Success' }
elsif ($res->is_error) { print 'Failed ' . $res->message }
elsif ($res->code == 301) { print 'Redirect Success ' . $res->headers->location }
else { print 'Manual Check Required URL...' }
The above code is giving the following failed message:
Failed Service Temporarily Unavailable

Anyhow, I resolved the failed issue, it may be useful to someone else:
use Mojo::Promise;
use Mojo::UserAgent;
my #urls = ('https://www.aba.com');
my $ua = Mojo::UserAgent->new;
my $res;
my #gets = map {
my $url = $_;
$res = $ua->get_p( $url )->then(
sub { print "Success -- valid -- $url" },
sub { print "Failed -- $url" },
);
} #urls;
Mojo::Promise->all( #gets )->wait;
The result is:
Success -- valid -- https://www.aba.com

Related

Using HTTP::Server::Simple::CGI, how do I get the headers?

Basically, my question is similar to
How do I access HTTP request headers in HTTP::Server::Simple::CGI?
The answer was to use parse_headers(), but there was no example how to use it properly. I tried to use parse_headers() but I'm not getting any result, it just stops at parse_headers() like the program is stucked. I couldn't add a comment on the question above since I don't have enough rep to do so, so I created this new question.
Below is my sample code, basically the example code from CPAN just added the parse_headers:
#!/usr/bin/perl
{
package MyWebServer;
use HTTP::Server::Simple::CGI;
our #ISA = qw(HTTP::Server::Simple::CGI);
use Data::Dumper;
my %dispatch = (
'/hello.cgi' => \&resp_hello,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
my $header = $self->parse_headers();
open F,qq{>>~/MyWebServer.log};
my $dump = Data::Dumper->Dump([$header], [qw($header)]);
print F $dump;
close F;
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi);
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_hello {
my $cgi = shift; # CGI.pm object
return if !ref $cgi;
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("Hello"),
$cgi->h1("Hello $who!"),
$cgi->end_html;
}
} # end of package MyWebServer
# start the server on port 8080
my $pid = MyWebServer->new(8080)->background();
print "Use 'kill $pid' to stop server.\n";
Only added this part:
my $header = $self->parse_headers();
open F,qq{>>~/MyWebServer.log};
my $dump = Data::Dumper->Dump([$header], [qw($header)]);
print F $dump;
close F;
My objective is to get all the headers and dump it into a file.
Add
sub headers {
my( $self, $headers ) = #_;
if( $headers ){
$self->{__last_headers} = { #$headers };
}
return $self->{__last_headers};
}
Then inside handle_request use my $header = $self->headers();
FWIW, i'm curious why you're using HTTP::Server::Simple::CGI instead of Mojolicious or Dancer or even HTTP::Server::Simple::PSGI. https://metacpan.org/pod/PSGI is portability.

Where can I find the request body in HTTP::Server::Simple

I have the following simple server:
And I am trying to locate where the request body (or content) is.
I have tried dumping $self and $cgi but they didn't contain the field (I am asuming because they don't carry any information regarding the request)
How can I get the request body ?
package MyWebServer;
use strict;
use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);
use Data::Dumper;
my %dispatch = (
'/hello' => \&resp_hello,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
print "printing self in request".Dumper($cgi);
my $req = $cgi->get_request;
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi, "asd");
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_hello($$) {
my ($cgi, $asd) = #_; # CGI.pm object
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("Hello"),
$cgi->h1("Hello world!!"),
$cgi->h2("Azdh $asd");
$cgi->end_html;
}
# start the server on port 8080
my $pid = MyWebServer->new(8081)->background();
print "Use 'kill $pid' to stop server.\n";
EDIT: Here is an example request:
use strict;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => "http://localhost:8081/hello");
$req->content("<foo>3.14</foo>"); # the request body
my $resp = $ua->request($req);
if ($resp->is_success) {
my $message = $resp->decoded_content;
print "Received reply: $message\n";
}
else {
print "HTTP GET error code: ", $resp->code, "\n";
print "HTTP GET error message: ", $resp->message, "\n";
}
It's a bit old, but facing the same issue, here's the solution :
$cgi->param('POSTDATA');
That's all you need to retreive the Body contents.
cheers.
The request object you obtained using the line $req = $cgi->get_request is a CGI::Request object. Since this is a request object, it will have only attributes (parameters passed on to the request). Please note that only response objects will have content. So, to see all the parameters you have passed, you can use the as_string() object method as mentioned below.
print $req->as_string;
For more information about accessing individual parameters of the request object, please see CGI::Request documentation in http://search.cpan.org/~mrjc/cvswebedit-v2.0b1/cvs-web/lib/CGI/Request.pm.

Detect a broken link (web) in perl

I'm trying to detect if a link is broken or not, as in if it's a web address I could paste into my browser and find a web page. I've tried two methods so far that I found online and both are giving me false positives (LWP::UserAgent and LWP::Simple).
#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/8.0"); # Pretend to be Mozilla
my $req = HTTP::Request->new(GET => "$url1");
my $res = $ua->request($req);
if ($res->is_success) {
print "Success!\n";
} else {
print "Error: " . $res->status_line . "\n";
}
$req = HTTP::Request->new(GET => "$url2");
$res = $ua->request($req);
if ($res->is_success) {
print "Success!\n";
} else {
print "Error: " . $res->status_line . "\n";
}
Which is giving me output of:
Success!
Success!
and then there's
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';
if (head("$url1")) {
print "Yes\n";
} else {
print "No\n";
}
if (head("$url2")) {
print "Yes\n";
} else {
print "No\n";
}
Which is giving me an output of:
Yes
Yes
Am I missing something here?
Your code worked fine for me, I can only see a problem if your running behind a VPN or gateway as previous stated. Always use strict and warnings, and here is an alternative way so you are not initializing a new Request object everytime you want to check for a valid link.
use strict;
use warnings;
use LWP::UserAgent;
sub check_url {
my ($url) = #_;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(HEAD => $url);
my $res = $ua->request($req);
return $res->status_line if $res->is_error;
return "Success: $url";
}

how to download `decoded_content`

***UPDATED CODE with resume functionality**
my $ua = LWP::UserAgent->new;
$ua->credentials('$ip:80', 'Realm', 'username', 'password');
my $response = $ua->mirror($url,$newfile);
if ($response->is_success) {
print "Download Successfull.";
}
else {
print "Error: " . $response->status_line;
}
********OLD CODE*****************
my $ua = LWP::UserAgent->new;
$ua->credentials('$ip:80', 'Realm', 'username', 'password');
my $response = $ua->get($url);
if ($response->is_success) {
print "Retrieved " .length($response->decoded_content) .
" bytes of data.";
}
else {
print "Error: " . $response->status_line;
}
open my $fh, '>encoding(UTF-8)', $tmp;
print {$fh} $response->decoded_content;
close $fh;
if ( -e $tmp ) {
my $filesize = ( stat $tmp )[9];
my $origsize = $queue[$rec][1];
if ( $filesize < $origsize) {
print "Resuming download";
******************************************
code for resuming the partly downloaded file...
*******************************************
}
else {
print "File downloaded correctly\n";
}
}
As i'm newbie to perl, could download decoded_content, though some errors persists.
Need to resume the file download, if we have a partial file.
This was the code i've tried, but am not able to know where to start with, hence any quick thoughts in this regard will be of great help indeed. Please help on this.
See method mirror in LWP::UserAgent. Documentation quote:
This method will get the document identified by $url and store it in file called $filename.
my $response = $ua->mirror($url, $filename); # no single quotes around variables!
See the source code for mirror, it deals correctly with truncated/partially downloaded files.

Why does my Perl script using WWW-Mechanize fail intermittently?

I am trying to write a Perl script using WWW-Mechanize.
Here is my code:
use DBI;
use JSON;
use WWW::Mechanize;
sub fetch_companies_list
{
my $url = shift;
my $browser = WWW::Mechanize->new( stack_depth => 0 );
my ($content, $json, $parsed_text, $company_name, $company_url);
eval
{
print "Getting the companies list...\n";
$browser->get( $url );
# die "Can't get the companies list.\n" unless( $browser->status );
$content = $browser->content();
# die "Can't get companies names.\n" unless( $browser->status );
$json = new JSON;
$parsed_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode( $content );
foreach(#$parsed_text)
{
$company_name = $_->{name};
fetch_company_info( $company_name, $browser );
}
}
}
fetch_companies_list( "http://api.crunchbase.com/v/1/companies.js" );
The problem is the follows:
I start the script it finishes fine.
I restart the script. The script fails in "$browser->get()".
I have to wait some time (about 5 min) then it will start working again.
I am working on Linux and have WWW-Mechanize version 1.66.
Any idea what might be the problem? I don't have any firewall installed either on computer or on my router.
Moreover uncommenting the "die ..." line does not help as it stopping inside get() call. I can try to upgrade to the latest, which is 1.71, but I'd like to know if someone else experience this with this Perl module.
5 minutes (300 seconds) is the default timeout. Exactly what timed out will be returned in the response's status line.
my $response = $mech->res;
if (!$response->is_success()) {
die($response->status_line());
}
This is target site issue. It shows
503 Service Unavailable No server is available to handle this
request.
right now.
Retry with wait, try this
## set maximum no of tries
my $retries = 10;
## number of secs to sleep
my $sleep = 1;
do {
eval {
print "Getting the companies list...\n";
$browser->get($url);
# die "Can't get the companies list.\n" unless( $browser->status );
$content = $browser->content();
# die "Can't get companies names.\n" unless( $browser->status );
$json = new JSON;
$parsed_text = $json->allow_nonref->utf8->relaxed->escape_slash->loose->allow_singlequote->allow_barekey->decode($content);
foreach (#$parsed_text) {
$company_name = $_->{name};
fetch_company_info( $company_name, $browser );
}
};
if ($#) {
warn $#;
## rest for some time
sleep($sleep);
## increase the value of $sleep exponetially
$sleep *= 2;
}
} while ( $# && $retries-- );