LWP::UserAgent - determining source of response code - PERL Modules - perl

When using the LWP::UserAgent module, one makes a request to a URL and
receives an HTTP::Response object which contains the response code
(hopefully 200!) and a status line.
My problem is that I can't figure out how to determine whether the
response code was returned from the webserver or from LWP::UserAgent.
For example, I believe that if the domain name doesn't resolve or you
simply cannot connect to the host, LWP::UserAgent reports this in the
form a 500 code, which is indistinguisable from a 500 "Internal Server
Error" code reported from the actual web server that's up but
experiencing some issues.
The problem is further amplified when going through a proxy server, as
there are now three possible "sources" of an error message:
the target webserver
the proxy server
the LWP::UserAgent library
How is one supposed to know if the 500 code means a) the server is up
but unhappy, b) the proxy could not connect to the server, or c)
LWP::UserAgent could not connect to the proxy?
I posted the same question here also:
http://www.justskins.com/forums/lwp-useragent-determining-source-43810.html

Error responses that LWP generates internally will have the
"Client-Warning" header set to the value "Internal response". If you
need to differentiate these internal responses from responses that a
remote server actually generates, you need to test this header value.
(from LWP::UserAgent -> REQUEST-METHODS)

#!/usr/bin/perl
use strict;
use LWP::UserAgent;
use HTTP::Request;
use IO::Socket::SSL;
my $ua = LWP::UserAgent->new(
ssl_opts => {
SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_NONE
}
);
my $request = HTTP::Request->new(GET => "www.example.com");
my $response = $ua->request($request);
my $clientWarning = $response->header("Client-Warning");
if(defined $clientWarning and length($clientWarning) != 0) {
if($clientWarning =~ /Internal response/) {
print "$server UNAVAILABLE";
}
} else {
print "server AVAILABLE";
}

Related

Send HTTP streaming request in perl

I want to send xml request using HTTP streaming protocol . where transfer-encoding is "chunked". Currently i am using LWP::UserAgent to send the xml transaction.
my $userAgent = LWP::UserAgent->new;
my $starttime = time();
my $response = $userAgent->request(POST $url,
Content_Type => 'application/xml',
Transfer_Encoding => 'Chunked',
Content => $xml);
print "Response".Dumper($response);
But i am getting http status code 411 Length Required. Which means "client error response code indicates that the server refuses to accept the request without a defined "
How we can handle this while sending a request in chunked ?
LWP::UserAgent's API isn't designed to send a stream, but it is able to do so with minimal hacking.
use strict;
use warnings qw( all );
use HTTP::Request::Common qw( POST );
use LWP::UserAgent qw( );
my $ua = LWP::UserAgent->new();
# Don't provide any content.
my $request = POST('http://stackoverflow.org/junk',
Content_Type => 'application/xml',
);
# POST() insists on adding a Content-Length header.
# We need to remove it to get a chunked request.
$request->headers->remove_header('Content-Length');
# Here's where we provide the stream generator.
my $buffer = 'abc\n';
$request->content(sub {
return undef if !length($buffer); # Return undef when done.
return substr($buffer, 0, length($buffer), ''); # Return a chunk of data otherwise.
});
my $response = $ua->request($request);
print($response->status_line);
Using a proxy (Fiddler), we can see this does indeed send a chunked request:
There's no point in using a chunked request if you already have the entire document handy like in the example you gave. Instead, let's say wanted to upload the output of some external tool as it produced its output. To do that, you could use the following:
open(my $pipe, '-|:raw', 'some_tool');
$request->content(sub {
my $rv = sysread($pipe, my $buf, 64*1024);
die $! if !defined($rv);
return undef if !$rv;
return $buf;
});
But i am getting http status code 411 Length Required.
Not all servers understand a request with a chunked payload even though this is standardized in HTTP/1.1 (but not in HTTP/1.0). For example nginx only supports chunking within a request since version 1.3.9 (2012), see Is there a way to avoid nginx 411 Content-Length required errors?. If the server does not understand a request with chunked encoding there is nothing you can do from the client side, i.e. you simply cannot use chunked transfer encoding then. If you have control over the server make sure that the server actually supports it.
I've also never experienced browsers send such requests, probably since they cannot guarantee that the server will support such request. I've only seen mobile apps used where the server and app is managed by the same party and thus support for chunked requests can be guaranteed.

Detecting if internet is connected in perl

I have this perl script to extract the source code of a webpage:
#!/usr/bin/perl
use LWP::UserAgent;
my $ou = new LWP::UserAgent;
my $url = "http://google.com";
my $source = $ou->get("$url")->decoded_content;
print "$source\n";
Now, I want to check the internet status if it is connected or not before extracting the source code .
The simplest way to detect whether a remote server is off line is to attempt to connect to it. Using LWP to send a head request (instead of get) retrieves just the HTTP header information without any content, and you should get a swift response from any server that is on line
The default timeout of LWP::UserAgent object is three minutes, so you will need to set it to something much shorter for a rapid test
This program temporarily sets the timeout to 0.5 seconds, sends a head request, and reports that the server is not responding if the result is an error of any sort. The original timeout value is restored before carrying on
Depending on the real server that you want to test, you will need to adjust the timeout carefully to avoid getting false negatives
use strict;
use warnings 'all';
use constant URL => 'http://www.google.com/';
use LWP;
my $ua = LWP::UserAgent->new;
{
my $to = $ua->timeout(0.5);
my $res = $ua->head(URL);
unless ( $res->is_success ) {
die sprintf "%s is not responding (%s)\n", URL, $res->status_line;
}
$ua->timeout($to);
}

Better way to proxy an HTTP request using Perl HTTP::Response and LWP?

I need a Perl CGI script that fetches a URL and then returns the result of the fetch - the status, headers and content - unaltered to the CGI environment so that the "proxied" URL is returned by the web server to the user's browser as if they'd accessed the URL directly.
I'm running my script from cgi-bin in an Apache web server on an Ubuntu 14.04 host, but this question should be independent of server platform - anything that can run Perl CGI scripts should be able to do it.
I've tried using LWP::UserAgent::request() and I've got very close. It returns an HTTP::Response object that contains the status code, headers and content, and even has an "as_string" method that turns it into a human-readable form. The problem from a CGI perspective is that "as string" converts the status code to "HTTP/1.1 200 OK" rather than "Status: 200 OK", so the Apache server doesn't recognise the output as a valid CGI response.
I can fix this by using other methods in HTTP::Response to split out the various parts, but there seems to be no public way of getting at the encapsulated HTTP::Headers object in order to call its as_string method; instead I have to hack into the Perl blessed object hash and yank out the private "_headers" member directly. To me this seems slightly evil, so is there a better way?
Here's some code to illustrate the above. If you put it in your cgi-bin directory then you can call it as
http://localhost/cgi-bin/lwp-test?url=http://localhost/&http-response=1&show=1
You can use a different URL for testing if you want. If you set http-response=0 (or drop the param altogether) then you get the working piece-by-piece solution. If you set show=0 (or drop it) then the proxied request is returned by the script. Apache will return the proxied page if you have http-response=0 and will choke with a 500 Internal Server Error if it's 1.
#!/usr/bin/perl
use strict;
use warnings;
use CGI::Simple;
use HTTP::Request;
use HTTP::Response;
use LWP::UserAgent;
my $q = CGI::Simple->new();
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => $q->param('url'));
my $res = $ua->request($req);
# print a text/plain header if called with "show=1" in the query string
# so proxied URL response is shown in browser, otherwise just output
# the proxied response as if it was ours.
if ($q->param('show')) {
print $q->header("text/plain");
print "\n";
}
if ($q->param('http-response')) {
# This prints the status as "HTTP/1.1 200 OK", not "Status: 200 OK".
print $res->as_string;
} else {
# This works correctly as a proxy, but using {_headers} to get at
# the private encapsulated HTTP:Response object seems a bit evil.
# There must be a better way!
print "Status: ", $res->status_line, "\n";
print $res->{_headers}->as_string;
print "\n";
print $res->content;
}
Please bear in mind that this script was written purely to demonstrate how to forward an HTTP::Response object to the CGI environment and bears no resemblance to my actual application.
You can go around the internals of the response object at $res->{_headers} by using the $res->headers method, that returns the actual HTTP::Headers instance that is used. HTTP::Response inherits that from HTTP::Message.
It would then look like this:
print "Status: ", $res->status_line, "\n";
print $res->headers->as_string;
That looks less evil, though it's still not pretty.
As simbabque pointed out, HTTP::Response has a headers method through inheritance from HTTP::Message. We can tidy up the handling of the status code by using HTTP::Response->header to push it into the embedded HTTP::Headers object, then use headers_as_string to print out the headers more cleanly. Here's the final script:-
#!/usr/bin/perl
use strict;
use warnings;
use CGI::Simple;
use HTTP::Request;
use HTTP::Response;
use LWP::UserAgent;
my $q = CGI::Simple->new();
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => $q->param('url'));
my $res = $ua->request($req);
# print a text/plain header if called with "show=1" in the query string
# so proxied URL response is shown in browser, otherwise just output
# the proxied response as if it was ours.
if ($q->param('show')) {
print $q->header("text/plain");
}
# $res->as_string returns the status in a "HTTP/1.1 200 OK" line rather than
# a "Status: 200 OK" header field so it can't be used for a CGI response.
# We therefore have a little more work to do...
# convert status from line to header field
$res->header("Status", $res->status_line);
# now print headers and content - don't forget a blank line between the two
print $res->headers_as_string, "\n", $res->content;

LWP Send Post request and get headers only in response

I have code like this
my $ua = new LWP::UserAgent;
$ua->timeout($timeout);
$ua->agent($useragent);
$response = $ua->post($domain,['login_name'=>$login,'login_password'=> $password])->as_string;
Content of page so large, thatI can't receive it. How to get only headers with sending post data?
I think this should do it for you.
my $ua = LWP::UserAgent->new();
$ua->timeout($timeout);
$ua->agent($useragent);
my $response = $ua->post(
$domain,
[ 'login_name' => $login, 'login_password' => $password ]
);
use Data::Dumper;
print Dumper( $response->headers() );
print $response->request()->content(), "\n";
To first, check if you can pass this login_name and login_password via HEAD (in url string: domain/?login_name=...&login_password=...). If this will not work, you are in bad case.
You cannot use POST with behavior of HEAD. LWP will wait full response.
Using POST the server will send you the content anyway, but you can avoid receiving all content using sockets tcp by yourself: gethostbyname, connect, sysread until you get /\r?\n\r?\n/ and close socket after this. Some traffic will be utilized anyway, but you can save memory and receive time.
Its not normal thing to do this with sockets, but sometimes when you have highload/big data - there is no better way than such mess.

How to detect a changed webpage?

In my application, I fetch webpages periodically using LWP. Is there anyway to check whether between two consecutive fetches the webpage has got changed in some respect (other than explicitly doing a comparison) ? Is there any signature(say CRC) that is being generated at lower protocol layers which can be extracted and compared against older signatures to see possible changes ?
There are two possible approaches. One is to use a digest of the page, e.g.
use strict;
use warnings;
use Digest::MD5 'md5_hex';
use LWP::UserAgent;
# fetch the page, etc.
my $digest = md5_hex $response->decoded_content;
if ( $digest ne $saved_digest ) {
# the page has changed.
}
Another option is to use an HTTP ETag, if the server provides one for the resource requested. You can simply store it and then set your request headers to include an If-None-Match field on subsequent requests. If the server ETag has remained the same, you'll get a 304 Not Modified status and an empty response body. Otherwise you'll get the new page. (And new ETag.) See Entity Tags in RFC2616.
Of course, the server could be lying, and sending the same ETag even though the content has changed. There's no way to know unless you look.
You should use the If-Modified-Since request header, noting the gotchas in the RFC. You send this header with the request. If the server supports it and thinks the content is newer, it sends it to you. If it thinks you have the most recent version, it returns a 304 with no message body.
However, as other answers have noted, the server doesn't have to tell you the truth, so you're sometimes stuck downloading the content and checking yourself. Many dynamic things will always claim to have new content because many developers have never thought about supporting basic HTTP things in their web apps.
For the LWP bits, you can create a single request with an extra header:
use HTTP::Request;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $request = HTTP::Request->new( GET => $url );
$r->header( 'If-Modified-Since' => $time );
$ua->request( $request );
For all requests, you can set a request handler:
$ua->add_handler(
request_send => sub {
my($request, $ua, $h) = #_;
# ... look up time from local store
$r->header( 'If-Modified-Since' => $time );
}
);
However, LWP can do most of this for you with mirror if you want to save the files:
$ua->mirror( $url, $filename )