How to verify HTTP::Response data is really complete 100% downloaded - perl

$url = "http://203.155.220.231/radar/pics/radarh.jpg";
use LWP::UserAgent;
$ua = new LWP::UserAgent;
$request = new HTTP::Request;
$request->method('GET');
$request->url($url);
$response = $ua->request($request);
if ( $response->is_error or
$response->header('Content-Type') ne 'image/jpeg' or
$response->header('Content-Length') ne length($response->content)
)
{
print $response->status_line . "\n";
print $response->header('Content-Length') . "\n";
print length($response->content) . "\n";
die "$!";
}
By checking $response->header('Content-Length') compare with length($response->content) is easiest way to verify data is really complete 100% downloaded ?

To check download complete, You should get total size of file before download. Then You can check download complete easily.

It depends on the response from the server.
If the server sends a Content-Length header then you could use this to verify the length. But if the server uses chunked Transfer-Encoding or simply closes the connection at the end of the response you cannot use this information. Such kind of responses are typical for dynamically generated content. And as far as I can see there are no information in these cases which let you determine if the download was complete or not.

Related

Detecting if internet is connected in perl

I have this perl script to extract the source code of a webpage:
#!/usr/bin/perl
use LWP::UserAgent;
my $ou = new LWP::UserAgent;
my $url = "http://google.com";
my $source = $ou->get("$url")->decoded_content;
print "$source\n";
Now, I want to check the internet status if it is connected or not before extracting the source code .
The simplest way to detect whether a remote server is off line is to attempt to connect to it. Using LWP to send a head request (instead of get) retrieves just the HTTP header information without any content, and you should get a swift response from any server that is on line
The default timeout of LWP::UserAgent object is three minutes, so you will need to set it to something much shorter for a rapid test
This program temporarily sets the timeout to 0.5 seconds, sends a head request, and reports that the server is not responding if the result is an error of any sort. The original timeout value is restored before carrying on
Depending on the real server that you want to test, you will need to adjust the timeout carefully to avoid getting false negatives
use strict;
use warnings 'all';
use constant URL => 'http://www.google.com/';
use LWP;
my $ua = LWP::UserAgent->new;
{
my $to = $ua->timeout(0.5);
my $res = $ua->head(URL);
unless ( $res->is_success ) {
die sprintf "%s is not responding (%s)\n", URL, $res->status_line;
}
$ua->timeout($to);
}

Better way to proxy an HTTP request using Perl HTTP::Response and LWP?

I need a Perl CGI script that fetches a URL and then returns the result of the fetch - the status, headers and content - unaltered to the CGI environment so that the "proxied" URL is returned by the web server to the user's browser as if they'd accessed the URL directly.
I'm running my script from cgi-bin in an Apache web server on an Ubuntu 14.04 host, but this question should be independent of server platform - anything that can run Perl CGI scripts should be able to do it.
I've tried using LWP::UserAgent::request() and I've got very close. It returns an HTTP::Response object that contains the status code, headers and content, and even has an "as_string" method that turns it into a human-readable form. The problem from a CGI perspective is that "as string" converts the status code to "HTTP/1.1 200 OK" rather than "Status: 200 OK", so the Apache server doesn't recognise the output as a valid CGI response.
I can fix this by using other methods in HTTP::Response to split out the various parts, but there seems to be no public way of getting at the encapsulated HTTP::Headers object in order to call its as_string method; instead I have to hack into the Perl blessed object hash and yank out the private "_headers" member directly. To me this seems slightly evil, so is there a better way?
Here's some code to illustrate the above. If you put it in your cgi-bin directory then you can call it as
http://localhost/cgi-bin/lwp-test?url=http://localhost/&http-response=1&show=1
You can use a different URL for testing if you want. If you set http-response=0 (or drop the param altogether) then you get the working piece-by-piece solution. If you set show=0 (or drop it) then the proxied request is returned by the script. Apache will return the proxied page if you have http-response=0 and will choke with a 500 Internal Server Error if it's 1.
#!/usr/bin/perl
use strict;
use warnings;
use CGI::Simple;
use HTTP::Request;
use HTTP::Response;
use LWP::UserAgent;
my $q = CGI::Simple->new();
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => $q->param('url'));
my $res = $ua->request($req);
# print a text/plain header if called with "show=1" in the query string
# so proxied URL response is shown in browser, otherwise just output
# the proxied response as if it was ours.
if ($q->param('show')) {
print $q->header("text/plain");
print "\n";
}
if ($q->param('http-response')) {
# This prints the status as "HTTP/1.1 200 OK", not "Status: 200 OK".
print $res->as_string;
} else {
# This works correctly as a proxy, but using {_headers} to get at
# the private encapsulated HTTP:Response object seems a bit evil.
# There must be a better way!
print "Status: ", $res->status_line, "\n";
print $res->{_headers}->as_string;
print "\n";
print $res->content;
}
Please bear in mind that this script was written purely to demonstrate how to forward an HTTP::Response object to the CGI environment and bears no resemblance to my actual application.
You can go around the internals of the response object at $res->{_headers} by using the $res->headers method, that returns the actual HTTP::Headers instance that is used. HTTP::Response inherits that from HTTP::Message.
It would then look like this:
print "Status: ", $res->status_line, "\n";
print $res->headers->as_string;
That looks less evil, though it's still not pretty.
As simbabque pointed out, HTTP::Response has a headers method through inheritance from HTTP::Message. We can tidy up the handling of the status code by using HTTP::Response->header to push it into the embedded HTTP::Headers object, then use headers_as_string to print out the headers more cleanly. Here's the final script:-
#!/usr/bin/perl
use strict;
use warnings;
use CGI::Simple;
use HTTP::Request;
use HTTP::Response;
use LWP::UserAgent;
my $q = CGI::Simple->new();
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => $q->param('url'));
my $res = $ua->request($req);
# print a text/plain header if called with "show=1" in the query string
# so proxied URL response is shown in browser, otherwise just output
# the proxied response as if it was ours.
if ($q->param('show')) {
print $q->header("text/plain");
}
# $res->as_string returns the status in a "HTTP/1.1 200 OK" line rather than
# a "Status: 200 OK" header field so it can't be used for a CGI response.
# We therefore have a little more work to do...
# convert status from line to header field
$res->header("Status", $res->status_line);
# now print headers and content - don't forget a blank line between the two
print $res->headers_as_string, "\n", $res->content;

LWP Send Post request and get headers only in response

I have code like this
my $ua = new LWP::UserAgent;
$ua->timeout($timeout);
$ua->agent($useragent);
$response = $ua->post($domain,['login_name'=>$login,'login_password'=> $password])->as_string;
Content of page so large, thatI can't receive it. How to get only headers with sending post data?
I think this should do it for you.
my $ua = LWP::UserAgent->new();
$ua->timeout($timeout);
$ua->agent($useragent);
my $response = $ua->post(
$domain,
[ 'login_name' => $login, 'login_password' => $password ]
);
use Data::Dumper;
print Dumper( $response->headers() );
print $response->request()->content(), "\n";
To first, check if you can pass this login_name and login_password via HEAD (in url string: domain/?login_name=...&login_password=...). If this will not work, you are in bad case.
You cannot use POST with behavior of HEAD. LWP will wait full response.
Using POST the server will send you the content anyway, but you can avoid receiving all content using sockets tcp by yourself: gethostbyname, connect, sysread until you get /\r?\n\r?\n/ and close socket after this. Some traffic will be utilized anyway, but you can save memory and receive time.
Its not normal thing to do this with sockets, but sometimes when you have highload/big data - there is no better way than such mess.

Perl LWP::useragent capture server response headers

I'm querying a webserver for a document and I want to capture the both the document and the related server response headers (esp. Content-Type: ...). I have trouble finding out how to read the headers. Here are some sniplets from my Perl script, I left error checking out for clarity:
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->agent( 'requiredCustomUserAgent' ); # I'm required to set a custom user agent
$imageData = $response->content; # This is the received document
So at this point I can retrieve the web document, but I want to know what Content-Type the server sent with it. Unfortunately this is not always the same as the mime type found by the bash 'file' command. This file-method fails in case of .js or .css documents.
http://search.cpan.org/perldoc?HTTP::Response
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
my $response = $ua->get("http://google.co.uk");
print $response->headers()->as_string;
print $response->header('content-type');
the thing that request returns contains a HTTP::Headers object, so look at the docs for HTTP::Headers to see how to use it. For instance
my $response = $ua->request($req);
my $headers = $response->headers();
my #header_field_names = $headers->header_field_names();
$logger->info("$_: ".$headers->header($_)) for grep {/Hogwarts/} #header_field_names;

Download only new/modified files with perl (or wget)

I have a Perl script which downloads a large number of files from a remote server. I'd like to avoid hammering the server, so I'd like to avoid downloading a file if it hasn't been modified since my last check. Is there a good way to do this, either in Perl or with a shell script?
Can I get the server to send HTTP 304 rather than HTTP 200 for unmodified files?
Yes, use LWP::UserAgent and pay special attention to the mirror method. This is also available in the procedural LWP::Simple as the mirror function.
From LWP's POD:
This method will get the document identified by $url and store it in file called $filename. If the file already exists, then the request will contain an "If-Modified-Since" header matching the modification time of the file. If the document on the server has not changed since this time, then nothing happens. If the document has been updated, it will be downloaded again. The modification time of the file will be forced to match that of the server.
The return value is the the response object.
HTTP 304 is the response code the server will return if you pass the If-Modified-Since test and your copy is fresh. LWP does this internally with mirror -- you needn't worry about it.
This is based on Evan Carrol's answer, but I'm going to elaborate in case this is useful for someone else. I stubbed out the response section; I doubt that part of my code will be interesting.
#!/usr/bin/perl -w
require HTTP::Date;
require LWP::UserAgent;
require Date::Parse;
my $lastChecked = '2009-01-01';
my $ua = LWP::UserAgent->new;
$ua->default_header('If-Modified-Since' => HTTP::Date::time2str(Date::Parse::str2time($lastChecked)));
my $response = $ua->get('http://example.com/');
if ($response->code == 304) {
print "No changes.\n";
} elsif ($response->is_success) {
print $response->decoded_content;
} else {
print "Response was error " . $response->code . ": '" . $response->status_line . "'\n";
}