Detect a broken link (web) in perl - perl

I'm trying to detect if a link is broken or not, as in if it's a web address I could paste into my browser and find a web page. I've tried two methods so far that I found online and both are giving me false positives (LWP::UserAgent and LWP::Simple).
#!/usr/bin/perl -w
use strict;
use LWP::UserAgent;
my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/8.0"); # Pretend to be Mozilla
my $req = HTTP::Request->new(GET => "$url1");
my $res = $ua->request($req);
if ($res->is_success) {
print "Success!\n";
} else {
print "Error: " . $res->status_line . "\n";
}
$req = HTTP::Request->new(GET => "$url2");
$res = $ua->request($req);
if ($res->is_success) {
print "Success!\n";
} else {
print "Error: " . $res->status_line . "\n";
}
Which is giving me output of:
Success!
Success!
and then there's
#!/usr/bin/perl -w
use strict;
use LWP::Simple;
my $url1 = 'http://www.gutenberg.org';
my $url2 = 'http://www.gooasdfzzzle.com.no/thisisnotarealsite';
if (head("$url1")) {
print "Yes\n";
} else {
print "No\n";
}
if (head("$url2")) {
print "Yes\n";
} else {
print "No\n";
}
Which is giving me an output of:
Yes
Yes
Am I missing something here?

Your code worked fine for me, I can only see a problem if your running behind a VPN or gateway as previous stated. Always use strict and warnings, and here is an alternative way so you are not initializing a new Request object everytime you want to check for a valid link.
use strict;
use warnings;
use LWP::UserAgent;
sub check_url {
my ($url) = #_;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(HEAD => $url);
my $res = $ua->request($req);
return $res->status_line if $res->is_error;
return "Success: $url";
}

Related

Special character are contained in password and it breaks the URL

I run a curl request within my perl script and if password or user name contains any special character(&) it break the url. Please let me know how can i solve this.
my $json="curl --user $userName:$password http://git.sample.com/rest/api/1.0/projects/TEAMS/repos/ ";
my $decodedJson = decode_json($json);
Finally i implemented #simbabque way using WP::UserAgent.
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $req = new HTTP::Request(GET => 'http://git.sample.com/rest/api/1.0/projects/TEAMS/repos/');
$req->authorization_basic($userName,$password);
my $response = $ua->request($req);
#handle error
unless ($response->is_success) {
die $response->status_line;
}
my $content = $response->decoded_content();
if (utf8::is_utf8($content)) {
binmode STDOUT,':utf8';
} else {
binmode STDOUT,':raw';
}
my $json=$content;
my $decodedJson = decode_json($json);

Where can I find the request body in HTTP::Server::Simple

I have the following simple server:
And I am trying to locate where the request body (or content) is.
I have tried dumping $self and $cgi but they didn't contain the field (I am asuming because they don't carry any information regarding the request)
How can I get the request body ?
package MyWebServer;
use strict;
use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);
use Data::Dumper;
my %dispatch = (
'/hello' => \&resp_hello,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
print "printing self in request".Dumper($cgi);
my $req = $cgi->get_request;
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi, "asd");
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_hello($$) {
my ($cgi, $asd) = #_; # CGI.pm object
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("Hello"),
$cgi->h1("Hello world!!"),
$cgi->h2("Azdh $asd");
$cgi->end_html;
}
# start the server on port 8080
my $pid = MyWebServer->new(8081)->background();
print "Use 'kill $pid' to stop server.\n";
EDIT: Here is an example request:
use strict;
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => "http://localhost:8081/hello");
$req->content("<foo>3.14</foo>"); # the request body
my $resp = $ua->request($req);
if ($resp->is_success) {
my $message = $resp->decoded_content;
print "Received reply: $message\n";
}
else {
print "HTTP GET error code: ", $resp->code, "\n";
print "HTTP GET error message: ", $resp->message, "\n";
}
It's a bit old, but facing the same issue, here's the solution :
$cgi->param('POSTDATA');
That's all you need to retreive the Body contents.
cheers.
The request object you obtained using the line $req = $cgi->get_request is a CGI::Request object. Since this is a request object, it will have only attributes (parameters passed on to the request). Please note that only response objects will have content. So, to see all the parameters you have passed, you can use the as_string() object method as mentioned below.
print $req->as_string;
For more information about accessing individual parameters of the request object, please see CGI::Request documentation in http://search.cpan.org/~mrjc/cvswebedit-v2.0b1/cvs-web/lib/CGI/Request.pm.

how to post an url and to retrive the website contents using perl

I want to get a solution for how to post one particular url and to retrieve the content. Is it possible with perl ? let it be a website where we are searching for one particular id, and we are supposed to get the associated information tagged to that id.
Use this
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $server_endpoint = "";
# set custom HTTP request header fields
my $req = HTTP::Request->new(GET => $server_endpoint);
$req->header('content-type' => 'application/json');
my $resp = $ua->request($req);
if ($resp->is_success) {
my $message = $resp->decoded_content;
print "Received reply: $message\n";
}
else {
print "HTTP GET error code: ", $resp->code, "\n";
print "HTTP GET error message: ", $resp->message, "\n";
}
Or you can do this way also usingLWP::Simple
use strict;
use warnings;
use LWP::Simple;
my $url = 'http://www.example.com';
my $content = get $url or die "Unable to get $url\n";
print $content;

Download Specific Images

I'm trying to search and download specific images /front and back cover / of a website if found but whatever I do I always download only one of them. What should I change in my code to download both of them if found?
use strict;
use warnings;
use LWP;
use LWP::UserAgent;
use URI::Escape;
use HTTP::Status;
getCover(...);
sub getCover {
......
while ($title_found =~ /'(http:\/\/images.blu-ray.com\/movies\/covers\/\d+_.*?)'/gis) {
$url = getSite($1);
if ($title_found =~ /front/) {
$filename = 'front.jpg';
}
elsif ($title_found =~ /back/) {
$filename = 'back.jpg';
}
}
my $dir = 'somepath'.$filename;
open F, ">", $dir;
binmode F;
print F $url;
close F;
return 0;
}
sub getSite {
$url = shift;
print "URL: $url\n";
my $r;
my $ua = new LWP::UserAgent();
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312");
my $req = new HTTP::Request GET => $url;
$req->push_header("Accept-Language", "en");
$req = $ua->prepare_request($req);
my $res = $ua->request($req);
my $rc = $res->code;
if(is_success($rc)){
$r = $res->as_string();
$r = $res->content();
}
else {
print "Failed\n";
}
return $r;
}
Try putting the part that saves to 'somepath'.$filename inside the while loop instead of outside it.
Also, it appears that $title_found is supposed to contain multiple URLs. In that case, you need to save $1 to a temporary variable, and look for front/back in that instead of in $title_found. Otherwise, you'll wind up saving both covers to front.jpg.

How can I write a simple HTTP proxy in Perl?

I don't want to use the HTTP::Proxy package because I want to dump out a couple requests. My one liner looks like this, but breaks on trying to pass the header in:
perl -MData::Dumper -MHTTP::Daemon -MHTTP::Status -MLWP::UserAgent -e 'my $ua = LWP::UserAgent->new;my $d=new HTTP::Daemon(LocalPort=>1999);print "Please contact me at: <", $d->url, ">\n";while (my $c = $d->accept) {while (my $r = $c->get_request) {if ($r->method eq 'GET' and $r->url->path eq "/uploader") {$c->send_response("whatever.");print Dumper($r);}else{$response=$ua->request($r->method,"http://localhost:1996".$r->uri,$r->headers,$r->content);$c->send_response($response);}}}'
formatted, that's:
#perl -e '
use Data::Dumper;
use HTTP::Daemon;
use HTTP::Status;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $d=new HTTP::Daemon(LocalPort=>1999);
print "Please contact me at: < ", $d->url, " >\n";
while (my $c = $d->accept) {
while (my $r = $c->get_request) {
if ($r->method eq 'GET' and $r->url->path eq "/uploaded") {
$c->send_response("whatever.");
print Dumper($r);
} else {
$response = $ua -> request(
$r->method,
"http://localhost:1996" . $r->uri,
$r->headers,
$r->content);
$c->send_response($response);
}
}
}#'
So I can't just pass in the request, because I need to change the host, and I can't just pass in the headers it seems... so what should I do to keep it short.
So can anyone make this a better one-liner?
Aw shoot, I fixed it with this:
#perl -e '
use Data::Dumper;
use HTTP::Daemon;
use HTTP::Status;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $d=new HTTP::Daemon(LocalPort=>1999);
print "Please contact me at: < ", $d->url, " >\n";
while (my $c = $d->accept) {
while (my $r = $c->get_request) {
if ($r->method eq "GET" and $r->url->path eq "/uploaded") {
$c->send_response("whatever.");
print Dumper($r);
} else {
$response = $ua -> request( HTTP::Request->new(
$r->method,
"http://localhost:1996" . $r->uri,
$r->headers,
$r->content));
$c->send_response($response);
}
}
}#'
note the HTTP::Request->new yeah... so it works, it's a tad slow. but that's okay