Perl script for Downloading the file from web - perl

I am trying to automate one of my task where i have to download a last 5 releases of some softwares let say Google talk from http://www.filehippo.com/download_google_talk/.
I have never done such type of programming i mean, to interact with Web through perl .I have just read and came to know that through CGI module we can implement this thing so i tried with this module.
If some body can give me better advice then please you are welcome :)
My code :
#!/usr/bin/perl
use strict;
use warnings;
use CGI;
use CGI::Carp qw/fatalsToBrowser/;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $q = CGI->new;
my $file = $q->param('file') or error('Error: No file selected.');
print "$file\n";
if ($file =~ /^(\w+[\w.-]+\.\w+)$/) {
$file = $1;
}
else {
error('Error: Unexpected characters in filename.');
}
if ($file) {
download($file) or error('Error: an unknown error has occured. Try again.');
}
sub download
{
open(DLFILE, '<', "$path_to_files/$file") or return(0);
print $q->header(-type => 'application/x-download',
-attachment => $file,
'Content-length' => -s "$path_to_files/$file",
);
binmode DLFILE;
print while <DLFILE>;
close (DLFILE);
return(1);
}
sub error {
print $q->header(),
$q->start_html(-title=>'Error'),
$q->h1($_[0]),
$q->end_html;
exit(0);
}
In above code i am trying to print the file name which i wan to download but it is displaying error message.I am not able to figure it out why this error "Error: No file selected." is comming.

Sorry, but you are in the wrong track. Your best bet is this module: http://metacpan.org/pod/WWW::Mechanize
This page contain a lot of example to start with: http://metacpan.org/pod/WWW::Mechanize::Examples
It could be more elegant but I think this code easier to understand.
use strict;
use warnings;
my $path_to_files = 'http://www.filehippo.com/download_google_talk/download/298ba15362f425c3ac48ffbda96a6156';
my $mech = WWW::Mechanize->new();
$mech->get( $path_to_files );
$mech->save_content( "download_google_talk.html" );#save the base to see how it looks like
foreach my $link ( $mech->links() ){ #walk all links
print "link: $link\n";
if ($link =~ m!what_you_want!i){ #if it match
my $fname = $link;
$fname =~ s!\A.*/!! if $link =~ m!/!;
$fname .= ".zip"; #add extension
print "Download $link to $fname\n";
$mech->get($link,":content_file" => "$fname" );#download the file and stoore it in a fname.
}
}

Related

Where does PERL LWP::Simple getstore save the image?

I am trying to use perl getstore to get a list of image from URL after reading a text file containing the file names, I created the code and able to run it successfully but I do not know where is the file saved, i checked the disk size it and shows that every time i run the code the hard disk free space decrease, so i assume there are file saved but I can't find it. So where does perl getstore save file and what is the correct way to save image from a link ?
use strict;
use warnings;
use LWP::UserAgent;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36");
my $file = 'image-list.txt';
open (DATA, $file) or die "Could not open $file: $!";
while(<DATA>){
my $link = "$url" . "$_";
my $filename = "$_";
print $link;
print $filename;
my $req = HTTP::Request->new(GET => $link);
my $res = $ua->request($req);
if($res->is_success){
my $rc = getstore($link, $filename);
if(is_success($rc)){
print "Success\n";
}else{
print "Error\n";
}
} else {
print $res->status_line, "\n";
}
}
According to
the documentation,
getstore(url, file) takes the URL as the first argument and the second argument is the file name where the result is stored. If the file name is a relative path (it doesn't begin with a slash /) it will be relative to the current working directory.
But you read the name from a file and then treat the full line, including the newline character, as the file name. That's probably not what you want, so you should use chomp to remove the newline.
Apart from that:
You are doing first a GET request using LWP::UserAgent to retrieve the file but ignore the response and instead call getstore to retrieve and store the same resource if the first GET was successful. It would be simpler to either just save the result from the first GET or just skip it and use only getstore.
You are using DATA as a file handle. While this is not wrong, DATA is already an implicit file handle which points to the program file after the __DATA__ marker, so I recommend to use a different file handle.
When using a simplified version of the code the file gets successfully stored:
use strict;
use warnings;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $file = 'image-list.txt';
open (my $fh, '<', $file) or die "Could not open $file: $!";
while ( <$fh> ) {
chomp; # remove the newline from the end of the line
my $link = $url . $_;
my $filename = $_;
my $rc = getstore($link, $filename);
if (is_success($rc)) {
print "Success\n";
}
else {
print "Error\n";
}
}

Perl HTTP::Request and Server Down

I have a perl script that is supposed to use HTTP::Response to grab an XML file and then parse it. The script works great, except when it can't reach the server its trying to get the info from.
I know I have to have an error checking and if an error does exist, use a return to continue on with the loop, I have done with with NET::SNMP and SSH, but I can't seem to get it working for this scenario. I'm about to be pull my hair in frustration. Any help is greatly appreciated.
#! /usr/bin/perl
use LWP::UserAgent;
use HTTP::Request::Common;
use Net::SNMP;
use XML::Simple;
use HTTP::Status;
$ua = LWP::UserAgent->new;
if ( open ( FH, "DeviceList.txt" ) )
{
while ( defined ( my $line = <FH> ) )
{
$line =~ s/\s+$//;
$device = $line;
&checkcon;
}
close FH;
}
else
{
print "DeviceList.txt file not found\n";
}
#exit;
sub checkcon
{
my ($req, $error) = HTTP::Request->new(GET => 'https://'.$device.'/getxml?location=/HelloWorld');
$ua->ssl_opts(SSL_verify_mode => SSL_VERIFY_NONE);
$ua->timeout(10);
$req->authorization_basic('test', 'test');
$test = $ua->request($req)->content;
print $req;
if ($test =~ "parser error") {
print "No Workie\n";
return;
}
#if (!is_success ($req))
#{
#print "No Workie!";
#return;
#}
# create object
my $xml = new XML::Simple;
# read XML file
my $data = $xml->XMLin("$test");
print "Looping";
# access XML data
`echo "$device,$data->{Hello}{World}{content}" >> Test.txt`;
}
exit 0;

Incorrect File Download using mechanize response in Perl

I created a script which access a URL with basic authentication. Once I've passed the credentials, it will download the file in my local folder. The problem is I got an incorrect filename. Here's my sample code:
#!/usr/bin/env perl
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $url = "http://sampleurl.com";
my $dir = 'C:\\pl';
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech ->credentials("sampleurl.com:80", "sampleurl.com", "username", "password");
$mech->get($url);
my $res = $mech->res();
if($res->is_success){
my $filename = $res->filename();
print $filename;
$mech->save_content( $dir.'\\'.$filename, binmode => ':raw', decoded_by_headers => 1 );
print $mech->status;
}else{
print "Error";
}
exit 0;
Instead of downloading sample_url.DOC, it only downloaded sample with no file extension. can you help with my problem? I want to download the whole file.
There's no guarantee that $res->filename(); will produce a file extension or anything at all for that matter. The page you're currently reading doesn't have a filename extension for example.
You will have to guess a filename extension from the media type.
use MIME::Types qw(by_mediatype);
...
my $filename = $r->filename();
if(!$filename) { $filename = 'untitled'; }
if($filename !~ /\.[a-zA-Z0-9]{1,4}$/) {
my $type = $res->header('Content-Type');
my $ext = 'txt';
if($type) {
my #types = by_mediatype($type);
if($#types > -1) {
$ext = $types[0][0];
}
}
$filename .= '.' . $ext;
}
print $filename;

Open remote file via http

Is there any perl module like File::Remote, that works over http (read only)? Something like
$magic_module->open( SCRAPE, "http://somesite.com/");
while(<SCRAPE>)
{
#do something
}
Yes, of course. You can use LWP::Simple:
use LWP::Simple;
my $content = get $url;
Don't forget to check if the content is not empty:
die "Can't download $url" unless defined $content;
$content will be undef it some error occurred during downloading.
Also you can use File::Fetch module:
File::Fetch
->new(uri => 'http://google.com/robots.txt')
->fetch(to => \(my $file));
say($file);
With HTTP::Tiny:
use HTTP::Tiny qw();
my $response = HTTP::Tiny->new->get('http://example.com/');
if ($response->{success}) {
print $response->{content};
}
If you want unified interface to handle both local, remote (HTTP/FTP) and whatever else files, use IO::All module.
use IO::All;
# reading local
my $handle = io("file.txt");
while(defined(my $line = $handle->getline)){
print $line
}
# reading remote
$handle = io("http://google.com");
while(defined(my $line = $handle->getline)){
print $line
}

How to Call .pl File inside .cgi Script

i am using getpdftext.pl from CAM::PDF to extract pdf and print it to text, but in my web application i want to call this getpdftext.pl inside .cgi script. Can u suggest me as to what to do or how to proceed ahead. i tried converting getpdftext.pl to getpdftext.cgi but it doesnt work.
Thanks all
this is a extract from my request_admin.cgi script
my $filename = $q->param('quote');
:
:
:
&parsePdf($filename);
#function to extract text from pdf ,save it in a text file and parse the required fields
sub parsePdf($)
{
my $i;
print $_[0];
$filein = "quote_uploads/$_[0]";
$fileout = 'output.txt';
print "inside parsePdf\n";
open OUT, ">$fileout" or die "error: $!";
open IN, '-|', "getpdftext.pl $filein" or die "error :$!" ;
while(<IN>)
{
print "$i";
$i++;
print OUT;
}
}
It's highly likely that
Your CGI script's environment isn't complete enough to locate
getpdftext.pl and/or
The web-server user doesn't have permission to execute it anyway
Have a look in your web-server's error-log and see if it is reporting any pointers as to why this doesn't work.
In your particular case, it might be simpler and more direct to use CAM::PDF directly, which should have been installed along with getpdftext.pl anyway.
I had a look at this script and I think that your parsePdf sub could just as easily be written as:
#!/usr/bin/perl
use warnings;
use strict;
use CAM::PDF;
sub parsePdf {
my $filein = "quote_uploads/$_[0]";
my $fileout = 'output.txt';
open my $out_fh, ">$fileout" or die "error: $!";
my $doc = CAM::PDF->new($filein) || die "$CAM::PDF::errstr\n";
my $i = 0;
foreach my $p ($doc->rangeToArray(1,$doc->numPages()))
{
my $str = $doc->getPageText($p);
if (defined $str)
{
CAM::PDF->asciify(\$str);
print $i++;
print $out_fh $str;
}
}
}