Perl: See how long it takes to download file

Perl: See how long it takes to download file - perl

I am trying to create a perl script that tests a bunch of mirror servers for the best mirror to use.
What is the best method to see how long it takes to download a file? I am trying to avoid system calls like
$starttime = time();
$res = `wget -o/dev/null SITE.com/file.txt`;
$endtime = time();
$elapsed = $endtime - $starttime;
I'd rather use Perl functions

You can use LWP::Simple or HTTP::Tiny (which is built in since 5.14.0).
use strict;
use warnings;
use v5.10; # for say()
use HTTP::Tiny;
use Time::HiRes qw(time); # to measure less than a second
use URI;
my $url = URI->new(shift);
# Add the HTTP scheme if the URL is schemeless.
$url = URI->new("http://$url") unless $url->scheme;
my $start = time;
my $response = HTTP::Tiny->new->get($url);
my $total = time - $start;
if( $response->{success} ) {
say "It took $total seconds to fetch $url";
say "The content was #{[ length $response->{content} ]} bytes";
}
else {
say "Fetching $url failed: $response->{status} $response->{reason}";
}

Related

get request in perl and Use of uninitialized value

my $url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=journal+of+medical+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]&usehistory=y";
print "\n before url \n";
print $url;
#post the esearch URL
my $output = get($url);
print $output;
I have not used perl ever before.
If I hit this URL in browser, I do get the XML.
However, From what I see in output from script, $output is empty and
print $output;
returns
Use of uninitialized value in print at ./extractEmails.pl line 48.
Please suggest what's wrong and how to fix it
Edit:
As suggested, complete code:
#!/usr/bin/perl -w
# A perlscript written by Joseph Hughes, University of Glasgow
# use this perl script to parse the email addressed from the affiliations in PubMed
use strict;
use LWP::Simple;
my ($query,#queries);
#Query the Journal of Virology from 2014 until the present (use 3000)
$query = 'journal+of+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Journal of General Virology
$query = 'journal+of+general+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Virology
$query = 'virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Archives of Virology
$query = 'archives+of+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Virus Research
$query = 'virus+research[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Antiviral Research
$query = 'antiviral+research[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Viruses
$query = 'viruses[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
push(#queries,$query);
#Journal of Medical Virology
$query = 'journal+of+medical+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]';
# global variables
push(#queries,$query);
my %emails;
my $emailcnt=0;
my $count=1;
#assemble the esearch URL
foreach my $query (#queries){
my $base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/';
#my $url = $base . "esearch.fcgi?db=pubmed&term=$query&usehistory=y";
my $url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=journal+of+medical+virology[journal]+AND+2014[Date+-+Publication]:3000[Date+-+Publication]&usehistory=y";
print "\n before url \n";
print $url;
#post the esearch URL
my $output = get($url);
print "\n before output \n";
print get($url);
print $output;
#parse WebEnv, QueryKey and Count (# records retrieved)
my $web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
my $key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);
my $count = $1 if ($output =~ /<Count>(\d+)<\/Count>/);
#retrieve data in batches of 500
my $retmax = 500;
for (my $retstart = 0; $retstart < $count; $retstart += $retmax) {
my $efetch_url = $base ."efetch.fcgi?db=pubmed&WebEnv=$web";
$efetch_url .= "&query_key=$key&retmode=xml";
my $efetch_out = get($efetch_url);
my #matches = $efetch_out =~ m(<Affiliation>(.*)</Affiliation>)g;
#print "$_\n" for #matches;
for my $match (#matches){
if ($match=~/\s([a-zA-Z0-9\.\_\-]+\#[a-zA-Z0-9\.\_\-]+)$/){
my $email=$1;
$email=~s/\.$//;
$emails{$email}++;
}
}
}
my $cnt= keys %emails;
print "$query\n$cnt\n";
}
print "Total number of emails: ";
my $cnt= keys %emails;
print "$cnt\n";
my #email = keys %emails;
my #VAR;
push #VAR, [ splice #email, 0, 100 ] while #email;
my $batch=100;
foreach my $VAR (#VAR){
open(OUT, ">Set_$batch\.txt") || die "Can't open file!\n";
print OUT join(",",#$VAR);
close OUT;
$batch=$batch+100;
}

I recommend against using LWP::Simple for any reason because it is impossible to configure it or handle errors usefully. Using LWP::UserAgent which it wraps is nearly as simple anyway (though the error handling is a bit complicated). The below examples would replace the use LWP::Simple; and my $output = get($url); lines.
use strict;
use warnings;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(timeout => 30);
my $response = $ua->get($url);
unless ($response->is_success) {
# the Client-Warning, Client-Aborted, and X-Died headers each may be set on client/transport errors
die $response->status_line;
}
my $output = $response->decoded_content;
The core HTTP::Tiny is also simple.
use strict;
use warnings;
use HTTP::Tiny;
my $ua = HTTP::Tiny->new;
my $response = $ua->get($url);
unless ($response->{success}) {
die $response->{status} == 599 ? $response->{content} : "$response->{status} $response->{reason}";
}
my $output = $response->{content};
If you really want an LWP::Simple approach that will at least report transport errors, try ojo from Mojolicious:
perl -Mojo -E'say g(shift)->text' http://example.com
In a script rather than a oneliner, you can use Mojo::UserAgent directly, and also handle HTTP errors like above:
use strict;
use warnings;
use Mojo::UserAgent;
my $ua = Mojo::UserAgent->new;
my $response = $ua->get($url)->result;
unless ($response->is_success) {
die $response->code . ' ' . $response->message;
}
my $output = $response->text;

Perl script for Downloading the file from web

I am trying to automate one of my task where i have to download a last 5 releases of some softwares let say Google talk from http://www.filehippo.com/download_google_talk/.
I have never done such type of programming i mean, to interact with Web through perl .I have just read and came to know that through CGI module we can implement this thing so i tried with this module.
If some body can give me better advice then please you are welcome :)
My code :
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use CGI::Carp qw/fatalsToBrowser/;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $q = CGI->new;
my $file = $q->param('file') or error('Error: No file selected.');
print "$file\n";
if ($file =~ /^(\w+[\w.-]+\.\w+)$/) {
$file = $1;
}
else {
error('Error: Unexpected characters in filename.');
}
if ($file) {
download($file) or error('Error: an unknown error has occured. Try again.');
}
sub download
{
open(DLFILE, '<', "$path_to_files/$file") or return(0);
print $q->header(-type => 'application/x-download',
-attachment => $file,
'Content-length' => -s "$path_to_files/$file",
);
binmode DLFILE;
print while <DLFILE>;
close (DLFILE);
return(1);
}
sub error {
print $q->header(),
$q->start_html(-title=>'Error'),
$q->h1($_[0]),
$q->end_html;
exit(0);
}
In above code i am trying to print the file name which i wan to download but it is displaying error message.I am not able to figure it out why this error "Error: No file selected." is comming.

Sorry, but you are in the wrong track. Your best bet is this module: http://metacpan.org/pod/WWW::Mechanize
This page contain a lot of example to start with: http://metacpan.org/pod/WWW::Mechanize::Examples
It could be more elegant but I think this code easier to understand.
use strict;
use warnings;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $mech = WWW::Mechanize->new();
$mech->get( $path_to_files );
$mech->save_content( "download_google_talk.html" );#save the base to see how it looks like
foreach my $link ( $mech->links() ){ #walk all links
print "link: $link\n";
if ($link =~ m!what_you_want!i){ #if it match
my $fname = $link;
$fname =~ s!\A.*/!! if $link =~ m!/!;
$fname .= ".zip"; #add extension
print "Download $link to $fname\n";
$mech->get($link,":content_file" => "$fname" );#download the file and stoore it in a fname.
}
}

Strip Out The Last URL Name In String

I need to strip out the last page name in a variable.
Here is my code so far but I loose the .jpg extension and I need to preserve it, Needed result: ImageName256.jpg
#!/usr/bin/perl
print "Content-type: text/html\n\n";
use CGI qw(:standard);
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
$Variable = "http://www.MyDomain.com/SomefolderPath/ImageName256.jpg";
($LastInUrl) = $Variable =~ m(.*/(\w+));
print $LastInUrl;

I usually avoid regexes for this sort of thing and handle it like this:
#!/usr/bin/perl
use strict;
use warnings;
use URI;
use URI::Escape qw( uri_unescape );
use File::Basename;
my $variable = "http://www.MyDomain.com/SomefolderPath/ImageName256.jpg";
my $last_in_url = uri_unescape( basename( URI->new( $variable )->path ) );
print $last_in_url;
Note: File::Basename's behaviour changes depending on which system it's used one. You'd have to use the following if you wanted portability:
my $fstype = fileparse_set_fstype('uri');
my $last = uri_unescape( basename( $uri->path ) );
fileparse_set_fstype($fstype);

use URI qw( );
my $uri = "http://www.MyDomain.com/SomefolderPath/ImageName256.jpg";
$uri = URI->new($uri);
my $basename = ( $uri->path_segments )[-1];
If you wanted to continue using a regex, you'd have to use the following:
use URI::Escape qw( uri_unescape );
my $uri = "http://www.MyDomain.com/SomefolderPath/ImageName256.jpg";
my $basename = $uri =~ m{/([^/#?]*)(?=[#?]|\z)} ? uri_unescape($1) : '';
Obviously, I strongly recommend the first solution over the second.

($LastInUrl) = (split qr{/}, $Variable)[-1];

Open remote file via http

Is there any perl module like File::Remote, that works over http (read only)? Something like
$magic_module->open( SCRAPE, "http://somesite.com/");
while(<SCRAPE>)
{
#do something
}

Yes, of course. You can use LWP::Simple:
use LWP::Simple;
my $content = get $url;
Don't forget to check if the content is not empty:
die "Can't download $url" unless defined $content;
$content will be undef it some error occurred during downloading.

Also you can use File::Fetch module:
File::Fetch
->new(uri => 'http://google.com/robots.txt')
->fetch(to => \(my $file));
say($file);

With HTTP::Tiny:
use HTTP::Tiny qw();
my $response = HTTP::Tiny->new->get('http://example.com/');
if ($response->{success}) {
print $response->{content};
}

If you want unified interface to handle both local, remote (HTTP/FTP) and whatever else files, use IO::All module.
use IO::All;
# reading local
my $handle = io("file.txt");
while(defined(my $line = $handle->getline)){
print $line
}
# reading remote
$handle = io("http://google.com");
while(defined(my $line = $handle->getline)){
print $line
}

Does somebody know a module, which has a function that returns a kb/s downloadspeed?

Does somebody know a module, which has a function that returns ( for example for eth0 ) the download speed?

Employ a monitoring program: atop, iftop, ntop, dstat, icinga, munin, knemo, ksysguardd

#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes;
use LWP::Simple;
my $url = 'http://www.cnn.com/';
my $file = 'cnn.html';
my $start = [Time::HiRes::gettimeofday()];
getstore($url, $file);
my $time = Time::HiRes::tv_interval($start);
my $size = -s $file;
printf "Speed: %d kbps\n", $size/$time/1000;

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Perl: See how long it takes to download file - perl

Related

get request in perl and Use of uninitialized value

Perl script for Downloading the file from web

Strip Out The Last URL Name In String

Open remote file via http

Does somebody know a module, which has a function that returns a kb/s downloadspeed?

Categories

Resources