Perl download from url to local drive - perl

I want to offer my visitors a file for download to their local machine (e.g. the Download directory in case of Windows7).
The code below works perfectly well, but only if the file is located on the same machine as the script:
#!/usr/bin/perl
my $path = "samples/10000.mp3"; ##PATH_TO_FILE
my $file = "10000.mp3";
print "Content-Type:application/octet-stream; name=\"$file\"\r\n";
print "Content-Disposition: attachment; filename=\"$file\"\r\n\n";
open( FILE, $path );
while(read(FILE, $buffer, 100) ){
print("$buffer");
}
The problem is that the file in question is located on another machine, so I have to get the url for download. I thought the coding below would do the trick, but no matter what I try, I end up with a downloaded file of 0 bytes. Can someone please tell me what I am doing wrong?
#!/usr/bin/perl
use LWP::Simple;
my $url = 'http://<sampleurl>.com';
my $file = '10000.mp3';
my $path = get($url);
print "Content-Type:application/octet-stream; name=\"$file\"\r\n";
print "Content-Disposition: attachment; filename=\"$file\"\r\n\n";
open my $fh, '+>', $path;
while(read($fh, $buffer, 100) ){
print("$buffer");
}

The get method in LWP::Simple returns the content, not the path to a file containing the content.
Once you have the content bits, write them to the standard output along with the header. Change your second program to
#! /usr/bin/perl
use LWP::Simple;
my $url = 'http://<sampleurl>.com';
my $file = '10000.mp3';
my $bits = get($url);
die "$0: get $url failed" unless defined $bits;
binmode STDOUT or die "$0: binmode: $!";
print qq[Content-Type:application/octet-stream; name="$file"\r\n],
qq[Content-Disposition: attachment; filename="$file"\r\n],
qq[\r\n],
$bits;

Related

Where does PERL LWP::Simple getstore save the image?

I am trying to use perl getstore to get a list of image from URL after reading a text file containing the file names, I created the code and able to run it successfully but I do not know where is the file saved, i checked the disk size it and shows that every time i run the code the hard disk free space decrease, so i assume there are file saved but I can't find it. So where does perl getstore save file and what is the correct way to save image from a link ?
use strict;
use warnings;
use LWP::UserAgent;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36");
my $file = 'image-list.txt';
open (DATA, $file) or die "Could not open $file: $!";
while(<DATA>){
my $link = "$url" . "$_";
my $filename = "$_";
print $link;
print $filename;
my $req = HTTP::Request->new(GET => $link);
my $res = $ua->request($req);
if($res->is_success){
my $rc = getstore($link, $filename);
if(is_success($rc)){
print "Success\n";
}else{
print "Error\n";
}
} else {
print $res->status_line, "\n";
}
}
According to
the documentation,
getstore(url, file) takes the URL as the first argument and the second argument is the file name where the result is stored. If the file name is a relative path (it doesn't begin with a slash /) it will be relative to the current working directory.
But you read the name from a file and then treat the full line, including the newline character, as the file name. That's probably not what you want, so you should use chomp to remove the newline.
Apart from that:
You are doing first a GET request using LWP::UserAgent to retrieve the file but ignore the response and instead call getstore to retrieve and store the same resource if the first GET was successful. It would be simpler to either just save the result from the first GET or just skip it and use only getstore.
You are using DATA as a file handle. While this is not wrong, DATA is already an implicit file handle which points to the program file after the __DATA__ marker, so I recommend to use a different file handle.
When using a simplified version of the code the file gets successfully stored:
use strict;
use warnings;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $file = 'image-list.txt';
open (my $fh, '<', $file) or die "Could not open $file: $!";
while ( <$fh> ) {
chomp; # remove the newline from the end of the line
my $link = $url . $_;
my $filename = $_;
my $rc = getstore($link, $filename);
if (is_success($rc)) {
print "Success\n";
}
else {
print "Error\n";
}
}

Decode with base64 a text file gzipped and read it

I retrieve from a system a text file which has as been (in order):
gzipped
encoded with base64
So I would like in Perl to decode it, unzip it and read it without passing by intermediate file.
I tried the following:
use Compress::Zlib;
use MIME::Base64;
my $workingDir = "./log/";
my $inputFile = $workingDir . "log_result_base64.txt";
my $readtmp ='';
open (INPFIC, $inputFile) or die "ERROR: Impossible to open file ($inputFile)\n";
while (my $buf = <INPFIC> ) {
$readtmp .= decode_base64($buf);
}
close(INPFIC);
my $output = uncompress($readtmp);
print $output;
But it does not work, the $output variable is still undef.
[Edit]
I gave up to do it by passing only by Variable.
I changed my script by creating a new file at each stage:
#!/usr/bin/perl
use strict ;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
use MIME::Base64;
my $inputFile = $workingDir . "log_inbase64.txt";
my $inputFilegz = $workingDir . "log.txt.gz";
my $inputFileuncomp = $workingDir . "log.txt";
my #out;
my #readtmp;
my $readtmp;
# Reading the file encoded in base64
open (INPFIC, $inputFile) or die "ERROR: Impossible to open file ($inputFile)\n";
my #readtmp = <INPFIC>;
close(INPFIC);
$readtmp = join('',#readtmp);
# Decode in base64 to retreive a Gzip file
my $out = decode_base64($readtmp);
open my $fh, '>', $inputFilegz or die $!;
binmode $fh;
print $fh $out;
close $fh;
# Decompress the early created gzip file
gunzip $inputFilegz => $inputFileuncomp
or die "gunzip failed: $GunzipError\n";
# Reading the Text file
open (INPFIC, $inputFileuncomp) or die "ERROR: Impossible to open file ($inputFileuncomp )\n";
my #out = <INPFIC>;
close(INPFIC);
The uncompress method does not work for gzipped data.
IO::Uncompress::Gunzip can use scalar references instead of file names if you want to keep everything in memory.
Sample code:
use IO::Uncompress::Gunzip qw( gunzip $GunzipError );
use MIME::Base64 qw( decode_base64 );
my $tmp = decode_base64 do {
local $/;
<DATA>
};
gunzip \$tmp => \my $data or die "Could not gunzip: $GunzipError";
print $data;
__DATA__
H4sIAHWHLlUAAwvJyCxWAKLi/NxUhZLU4hKFlMSSRC4AsSDaaxcAAAA=
Should produce:
This is some test data
I'd put the whole file in a string before decode:
local $/ = undef;
my $str = <INPFIC>
my $dec = decode_base64 $str;
my $uncom = uncompress($dec)
According to Compress::Zlib doc, try to open and read in same time:
my $workingDir = "./log/";
my $inputFile = $workingDir . "log_result_base64.txt";
my $buffer;
my $output;
my $gz = gzopen($inputFile,"rb")
or die "Cannot open $inputFile: $gzerrno\n" ;
while ( $gz->gzread($buffer) > 0 ){
$output .= decode_base64 $buffer;
}
die "Error reading from $inputFile: $gzerrno" . ($gzerrno+0) . "\n"
if $gzerrno != Z_STREAM_END ;
$gz->gzclose();
print $output;

perl save a file downloaded by lwp

Im using LWP to download an executable file type and with the response in memory, i am able to hash the file. However how can i save this file on my system? I think i'm on the wrong track with what i'm trying below. The download is successful as i am able to generate the hash correctly (I've double checked it by downloading the actual file and comparing the hashes).
use strict;
use warnings;
use LWP::Useragent;
use Digest::MD5 qw( md5_hex );
use Digest::MD5::File qw( file_md5_hex );
use File::Fetch;
my $url = 'http://www.karenware.com/progs/pthasher-setup.exe';
my $filename = $url;
$filename =~ m/.*\/(.*)$/;
$filename = $1;
my $dir ='/download/two';
print "$filename\n";
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url);
die $response->status_line if !$response->is_success;
my $file = $response->decoded_content( charset => 'none' );
my $md5_hex = md5_hex($file);
print "$md5_hex\n";
my $save = "Downloaded/$filename";
unless(open SAVE, '>>'.$save) {
die "\nCannot create save file '$save'\n";
}
print SAVE $file;
close SAVE;
If you are wondering why do i not instead download everything then parse the folder for each file and hash, its because im downloading all these files in a loop. And during each loop, i upload the relevant source URL (where this file was found) , along with the file name and hash into a database at one go.
Try getstore() from LWP::Simple
use strict;
use warnings;
use LWP::Simple qw(getstore);
use LWP::UserAgent;
use Digest::MD5 qw( md5_hex );
use Digest::MD5::File qw( file_md5_hex );
use File::Fetch;
my $url = 'http://www.karenware.com/progs/pthasher-setup.exe';
my $filename = $url;
$filename =~ m/.*\/(.*)$/;
$filename = $1;
my $dir ='/download/two';
print "$filename\n";
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url);
die $response->status_line if !$response->is_success;
my $file = $response->decoded_content( charset => 'none' );
my $md5_hex = md5_hex($file);
print "$md5_hex\n";
my $save = "Downloaded/$filename";
getstore($url,$save);
getstore is an excellent solution, however for anyone else reading this response in a slightly different setup, it may not solve the issue.
First of all, you could quite possibly just be suffering from a binary/text issue.
I'd change
my $save = "Downloaded/$filename";
unless(open SAVE, '>>'.$save) {
die "\nCannot create save file '$save'\n";
}
print SAVE $file;
close SAVE;
to
my $save = "Downloaded/$filename";
open my $fh, '>>', $save or die "\nCannot create save file '$save' because $!\n";
# on platforms where this matters
# (like Windows) this is needed for
# 'binary' files:
binmode $fh;
print $fh $file;
close $fh;
The reason I like this better is that if you have set or acquired some settings on your browser object ($ua), they are ignored in LWP::Simple's getstore, as it uses its own browser.
Also, it uses the three parameter version of open which should be safer.
Another solution would be to use the callback method and store the file while you are downloading it, if for example you are dealing with a large file. The hashing algorithm would have to be changed so it is probably not relevant here but here's a sample:
my $req = HTTP::Request->new(GET => $uri);
open(my $fh, '>', $filename) or die "Could not write to '$filename': $!";
binmode $fh;
$res = $ua->request($req, sub {
my ($data, $response, $protocol) = #_;
print $fh $data;
});
close $fh;
And if the size is unimportant (and the hashing is done some other way) you could just ask your browser to store it directly:
my $req = HTTP::Request->new(GET => $uri);
$res = $ua->request($req, $filename);

How to Call .pl File inside .cgi Script

i am using getpdftext.pl from CAM::PDF to extract pdf and print it to text, but in my web application i want to call this getpdftext.pl inside .cgi script. Can u suggest me as to what to do or how to proceed ahead. i tried converting getpdftext.pl to getpdftext.cgi but it doesnt work.
Thanks all
this is a extract from my request_admin.cgi script
my $filename = $q->param('quote');
:
:
:
&parsePdf($filename);
#function to extract text from pdf ,save it in a text file and parse the required fields
sub parsePdf($)
{
my $i;
print $_[0];
$filein = "quote_uploads/$_[0]";
$fileout = 'output.txt';
print "inside parsePdf\n";
open OUT, ">$fileout" or die "error: $!";
open IN, '-|', "getpdftext.pl $filein" or die "error :$!" ;
while(<IN>)
{
print "$i";
$i++;
print OUT;
}
}
It's highly likely that
Your CGI script's environment isn't complete enough to locate
getpdftext.pl and/or
The web-server user doesn't have permission to execute it anyway
Have a look in your web-server's error-log and see if it is reporting any pointers as to why this doesn't work.
In your particular case, it might be simpler and more direct to use CAM::PDF directly, which should have been installed along with getpdftext.pl anyway.
I had a look at this script and I think that your parsePdf sub could just as easily be written as:
#!/usr/bin/perl
use warnings;
use strict;
use CAM::PDF;
sub parsePdf {
my $filein = "quote_uploads/$_[0]";
my $fileout = 'output.txt';
open my $out_fh, ">$fileout" or die "error: $!";
my $doc = CAM::PDF->new($filein) || die "$CAM::PDF::errstr\n";
my $i = 0;
foreach my $p ($doc->rangeToArray(1,$doc->numPages()))
{
my $str = $doc->getPageText($p);
if (defined $str)
{
CAM::PDF->asciify(\$str);
print $i++;
print $out_fh $str;
}
}
}

perl file upload can't init filehandle

I tried to use this very simple script for uploading a file to my server. For some reason it is not working. I get the following message in my apache error log:
Use of uninitialized value in <HANDLE> at /opt/www/demo1/upload/image_upload_2.pl line 15.
readline() on unopened filehandle at /opt/www/demo1/upload/image_upload_2.pl line 15.
#!/usr/bin/perl -w
use CGI;
$upload_dir = "/opt/www/demo1/upload/data";
$query = new CGI;
$filename = $query->param("photo");
$filename =~ s/.*[\/\\](.*)/$1/;
$upload_filehandle = $query->upload("photo");
open UPLOADFILE, ">$upload_dir/$filename";
binmode UPLOADFILE;
while ( <$upload_filehandle> )
{
print UPLOADFILE;
}
close UPLOADFILE;
1
Any ideas what is wrong there?
Thanks
mx
File upload forms need to specify enctype="multipart/form-data". See W3C documentation.
In addition, note the following:
#!/usr/bin/perl
use strict; use warnings;
use CGI;
my $upload_dir = "/opt/www/demo1/upload/data";
my $query = CGI->new; # avoid indirect object notation
my $filename = $query->param("photo");
$filename =~ s/.*[\/\\](.*)/$1/; # this validation looks suspect
my $target = "$upload_dir/$filename";
# since you are reading binary data, use read to
# read chunks of a specific size
my $upload_filehandle = $query->upload("photo");
if ( defined $upload_filehandle ) {
my $io_handle = $upload_filehandle->handle;
# use lexical filehandles, 3-arg form of open
# check for errors after open
open my $uploadfile, '>', $target
or die "Cannot open '$target': $!";
binmode $uploadfile;
my $buffer;
while (my $bytesread = $io_handle->read($buffer,1024)) {
print $uploadfile $buffer
or die "Error writing to '$target': $!";
}
close $uploadfile
or die "Error closing '$target': $!";
}
See CGI documentation.
If you are uploading a text file then below should be set in <head> of html file:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Otherwise the $file_name = $query->param("file_name") is defined in scalar context (print $file_name) and undef in file context ( <$file_name> ).