Open remote file via http - perl

Is there any perl module like File::Remote, that works over http (read only)? Something like
$magic_module->open( SCRAPE, "http://somesite.com/");
while(<SCRAPE>)
{
#do something
}

Yes, of course. You can use LWP::Simple:
use LWP::Simple;
my $content = get $url;
Don't forget to check if the content is not empty:
die "Can't download $url" unless defined $content;
$content will be undef it some error occurred during downloading.

Also you can use File::Fetch module:
File::Fetch
->new(uri => 'http://google.com/robots.txt')
->fetch(to => \(my $file));
say($file);

With HTTP::Tiny:
use HTTP::Tiny qw();
my $response = HTTP::Tiny->new->get('http://example.com/');
if ($response->{success}) {
print $response->{content};
}

If you want unified interface to handle both local, remote (HTTP/FTP) and whatever else files, use IO::All module.
use IO::All;
# reading local
my $handle = io("file.txt");
while(defined(my $line = $handle->getline)){
print $line
}
# reading remote
$handle = io("http://google.com");
while(defined(my $line = $handle->getline)){
print $line
}

Related

Where does PERL LWP::Simple getstore save the image?

I am trying to use perl getstore to get a list of image from URL after reading a text file containing the file names, I created the code and able to run it successfully but I do not know where is the file saved, i checked the disk size it and shows that every time i run the code the hard disk free space decrease, so i assume there are file saved but I can't find it. So where does perl getstore save file and what is the correct way to save image from a link ?
use strict;
use warnings;
use LWP::UserAgent;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36");
my $file = 'image-list.txt';
open (DATA, $file) or die "Could not open $file: $!";
while(<DATA>){
my $link = "$url" . "$_";
my $filename = "$_";
print $link;
print $filename;
my $req = HTTP::Request->new(GET => $link);
my $res = $ua->request($req);
if($res->is_success){
my $rc = getstore($link, $filename);
if(is_success($rc)){
print "Success\n";
}else{
print "Error\n";
}
} else {
print $res->status_line, "\n";
}
}
According to
the documentation,
getstore(url, file) takes the URL as the first argument and the second argument is the file name where the result is stored. If the file name is a relative path (it doesn't begin with a slash /) it will be relative to the current working directory.
But you read the name from a file and then treat the full line, including the newline character, as the file name. That's probably not what you want, so you should use chomp to remove the newline.
Apart from that:
You are doing first a GET request using LWP::UserAgent to retrieve the file but ignore the response and instead call getstore to retrieve and store the same resource if the first GET was successful. It would be simpler to either just save the result from the first GET or just skip it and use only getstore.
You are using DATA as a file handle. While this is not wrong, DATA is already an implicit file handle which points to the program file after the __DATA__ marker, so I recommend to use a different file handle.
When using a simplified version of the code the file gets successfully stored:
use strict;
use warnings;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $file = 'image-list.txt';
open (my $fh, '<', $file) or die "Could not open $file: $!";
while ( <$fh> ) {
chomp; # remove the newline from the end of the line
my $link = $url . $_;
my $filename = $_;
my $rc = getstore($link, $filename);
if (is_success($rc)) {
print "Success\n";
}
else {
print "Error\n";
}
}

Perl HTTP::Request and Server Down

I have a perl script that is supposed to use HTTP::Response to grab an XML file and then parse it. The script works great, except when it can't reach the server its trying to get the info from.
I know I have to have an error checking and if an error does exist, use a return to continue on with the loop, I have done with with NET::SNMP and SSH, but I can't seem to get it working for this scenario. I'm about to be pull my hair in frustration. Any help is greatly appreciated.
#! /usr/bin/perl
use LWP::UserAgent;
use HTTP::Request::Common;
use Net::SNMP;
use XML::Simple;
use HTTP::Status;
$ua = LWP::UserAgent->new;
if ( open ( FH, "DeviceList.txt" ) )
{
while ( defined ( my $line = <FH> ) )
{
$line =~ s/\s+$//;
$device = $line;
&checkcon;
}
close FH;
}
else
{
print "DeviceList.txt file not found\n";
}
#exit;
sub checkcon
{
my ($req, $error) = HTTP::Request->new(GET => 'https://'.$device.'/getxml?location=/HelloWorld');
$ua->ssl_opts(SSL_verify_mode => SSL_VERIFY_NONE);
$ua->timeout(10);
$req->authorization_basic('test', 'test');
$test = $ua->request($req)->content;
print $req;
if ($test =~ "parser error") {
print "No Workie\n";
return;
}
#if (!is_success ($req))
#{
#print "No Workie!";
#return;
#}
# create object
my $xml = new XML::Simple;
# read XML file
my $data = $xml->XMLin("$test");
print "Looping";
# access XML data
`echo "$device,$data->{Hello}{World}{content}" >> Test.txt`;
}
exit 0;

Perl script for Downloading the file from web

I am trying to automate one of my task where i have to download a last 5 releases of some softwares let say Google talk from http://www.filehippo.com/download_google_talk/.
I have never done such type of programming i mean, to interact with Web through perl .I have just read and came to know that through CGI module we can implement this thing so i tried with this module.
If some body can give me better advice then please you are welcome :)
My code :
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use CGI::Carp qw/fatalsToBrowser/;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $q = CGI->new;
my $file = $q->param('file') or error('Error: No file selected.');
print "$file\n";
if ($file =~ /^(\w+[\w.-]+\.\w+)$/) {
$file = $1;
}
else {
error('Error: Unexpected characters in filename.');
}
if ($file) {
download($file) or error('Error: an unknown error has occured. Try again.');
}
sub download
{
open(DLFILE, '<', "$path_to_files/$file") or return(0);
print $q->header(-type => 'application/x-download',
-attachment => $file,
'Content-length' => -s "$path_to_files/$file",
);
binmode DLFILE;
print while <DLFILE>;
close (DLFILE);
return(1);
}
sub error {
print $q->header(),
$q->start_html(-title=>'Error'),
$q->h1($_[0]),
$q->end_html;
exit(0);
}
In above code i am trying to print the file name which i wan to download but it is displaying error message.I am not able to figure it out why this error "Error: No file selected." is comming.
Sorry, but you are in the wrong track. Your best bet is this module: http://metacpan.org/pod/WWW::Mechanize
This page contain a lot of example to start with: http://metacpan.org/pod/WWW::Mechanize::Examples
It could be more elegant but I think this code easier to understand.
use strict;
use warnings;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $mech = WWW::Mechanize->new();
$mech->get( $path_to_files );
$mech->save_content( "download_google_talk.html" );#save the base to see how it looks like
foreach my $link ( $mech->links() ){ #walk all links
print "link: $link\n";
if ($link =~ m!what_you_want!i){ #if it match
my $fname = $link;
$fname =~ s!\A.*/!! if $link =~ m!/!;
$fname .= ".zip"; #add extension
print "Download $link to $fname\n";
$mech->get($link,":content_file" => "$fname" );#download the file and stoore it in a fname.
}
}

perl save a file downloaded by lwp

Im using LWP to download an executable file type and with the response in memory, i am able to hash the file. However how can i save this file on my system? I think i'm on the wrong track with what i'm trying below. The download is successful as i am able to generate the hash correctly (I've double checked it by downloading the actual file and comparing the hashes).
use strict;
use warnings;
use LWP::Useragent;
use Digest::MD5 qw( md5_hex );
use Digest::MD5::File qw( file_md5_hex );
use File::Fetch;
my $url = 'http://www.karenware.com/progs/pthasher-setup.exe';
my $filename = $url;
$filename =~ m/.*\/(.*)$/;
$filename = $1;
my $dir ='/download/two';
print "$filename\n";
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url);
die $response->status_line if !$response->is_success;
my $file = $response->decoded_content( charset => 'none' );
my $md5_hex = md5_hex($file);
print "$md5_hex\n";
my $save = "Downloaded/$filename";
unless(open SAVE, '>>'.$save) {
die "\nCannot create save file '$save'\n";
}
print SAVE $file;
close SAVE;
If you are wondering why do i not instead download everything then parse the folder for each file and hash, its because im downloading all these files in a loop. And during each loop, i upload the relevant source URL (where this file was found) , along with the file name and hash into a database at one go.
Try getstore() from LWP::Simple
use strict;
use warnings;
use LWP::Simple qw(getstore);
use LWP::UserAgent;
use Digest::MD5 qw( md5_hex );
use Digest::MD5::File qw( file_md5_hex );
use File::Fetch;
my $url = 'http://www.karenware.com/progs/pthasher-setup.exe';
my $filename = $url;
$filename =~ m/.*\/(.*)$/;
$filename = $1;
my $dir ='/download/two';
print "$filename\n";
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url);
die $response->status_line if !$response->is_success;
my $file = $response->decoded_content( charset => 'none' );
my $md5_hex = md5_hex($file);
print "$md5_hex\n";
my $save = "Downloaded/$filename";
getstore($url,$save);
getstore is an excellent solution, however for anyone else reading this response in a slightly different setup, it may not solve the issue.
First of all, you could quite possibly just be suffering from a binary/text issue.
I'd change
my $save = "Downloaded/$filename";
unless(open SAVE, '>>'.$save) {
die "\nCannot create save file '$save'\n";
}
print SAVE $file;
close SAVE;
to
my $save = "Downloaded/$filename";
open my $fh, '>>', $save or die "\nCannot create save file '$save' because $!\n";
# on platforms where this matters
# (like Windows) this is needed for
# 'binary' files:
binmode $fh;
print $fh $file;
close $fh;
The reason I like this better is that if you have set or acquired some settings on your browser object ($ua), they are ignored in LWP::Simple's getstore, as it uses its own browser.
Also, it uses the three parameter version of open which should be safer.
Another solution would be to use the callback method and store the file while you are downloading it, if for example you are dealing with a large file. The hashing algorithm would have to be changed so it is probably not relevant here but here's a sample:
my $req = HTTP::Request->new(GET => $uri);
open(my $fh, '>', $filename) or die "Could not write to '$filename': $!";
binmode $fh;
$res = $ua->request($req, sub {
my ($data, $response, $protocol) = #_;
print $fh $data;
});
close $fh;
And if the size is unimportant (and the hashing is done some other way) you could just ask your browser to store it directly:
my $req = HTTP::Request->new(GET => $uri);
$res = $ua->request($req, $filename);

How do I access gzipped files without creating additional processes?

My application reads and writes a lot of medium to large files. I would like to store these in zipped format. Saves diskspace and network time.
One way to do it is with this:
sub fopen {
my $mode = shift;
my $filename = shift;
if ($filename =~ /\.gz$/) {
if ($mode eq "<") {
open(my $fp, "-|", "/usr/bin/gzcat $filename");
#my $fp = gzopen($filename, "rb") ;
return $fp;
}
if ($mode eq ">") {
open(my $fp, "|-", "/usr/bin/gzip > $filename");
#my $fp = gzopen($filename, "wb") ;
return $fp;
}
} else {
open(my $fp, $mode, $filename);
return $fp;
}
}
I can then change my existing code simply by swapping the calls to open.
As is apparent from the function, I've also thought of using the zlib/compress library. The problem is that the result can't be passed around as a file pointer.
Is there a way to do this that doesn't involved creating a bunch of extra processes?
From the documentation of IO::Uncompress::Gunzip
use IO::Uncompress::Gunzip qw($GunzipError);
my $z = IO::Uncompress::Gunzip->new( $input )
or die "IO::Uncompress::Gunzip failed: $GunzipError\n";
The variable $z is now a file handle that you can use as usual.
while (<$z>) {...}
Just to add some information about previous answers, from an old bench I made, PerlIO::gzip is faster than IO::Uncompress::Gunzip.
Look at the IO::* namespace on your Perl version.
For example Debian old-stable (5 - Lenny) Perl and next versions, ships IO::Uncompress::Gunzip and IO::Uncompress::AnyUncompress.
#!/usr/bin/perl
use strict ;
use warnings ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);
my $input = "file1.txt.gz";
my $output = "file1.txt";
gunzip $input => $output
or die "gunzip failed: $GunzipError\n";