I am trying to use perl getstore to get a list of image from URL after reading a text file containing the file names, I created the code and able to run it successfully but I do not know where is the file saved, i checked the disk size it and shows that every time i run the code the hard disk free space decrease, so i assume there are file saved but I can't find it. So where does perl getstore save file and what is the correct way to save image from a link ?
use strict;
use warnings;
use LWP::UserAgent;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/59.0.3071.104 Safari/537.36");
my $file = 'image-list.txt';
open (DATA, $file) or die "Could not open $file: $!";
while(<DATA>){
my $link = "$url" . "$_";
my $filename = "$_";
print $link;
print $filename;
my $req = HTTP::Request->new(GET => $link);
my $res = $ua->request($req);
if($res->is_success){
my $rc = getstore($link, $filename);
if(is_success($rc)){
print "Success\n";
}else{
print "Error\n";
}
} else {
print $res->status_line, "\n";
}
}
According to
the documentation,
getstore(url, file) takes the URL as the first argument and the second argument is the file name where the result is stored. If the file name is a relative path (it doesn't begin with a slash /) it will be relative to the current working directory.
But you read the name from a file and then treat the full line, including the newline character, as the file name. That's probably not what you want, so you should use chomp to remove the newline.
Apart from that:
You are doing first a GET request using LWP::UserAgent to retrieve the file but ignore the response and instead call getstore to retrieve and store the same resource if the first GET was successful. It would be simpler to either just save the result from the first GET or just skip it and use only getstore.
You are using DATA as a file handle. While this is not wrong, DATA is already an implicit file handle which points to the program file after the __DATA__ marker, so I recommend to use a different file handle.
When using a simplified version of the code the file gets successfully stored:
use strict;
use warnings;
use LWP::Simple;
my $url = "https://labs.jamesooi.com/images/";
my $file = 'image-list.txt';
open (my $fh, '<', $file) or die "Could not open $file: $!";
while ( <$fh> ) {
chomp; # remove the newline from the end of the line
my $link = $url . $_;
my $filename = $_;
my $rc = getstore($link, $filename);
if (is_success($rc)) {
print "Success\n";
}
else {
print "Error\n";
}
}
Related
The files are not downloading, please help.
#!/usr/bin/perl -w
require HTTP::Response;
require LWP::UserAgent;
open (INPUT, "ndb_id_file.txt") or die "can't open ndb_id_file.txt";
#input = <INPUT>
foreach $line(#input) {
$ua = LWP::UserAgent->new;
$ua->env_proxy('http');
$ua->proxy(['http', 'ftp'],'http://144020019:*******#netmon.****.ac.in:80');
response =
$ua->get('www.ndbserver.rutgers.edu/files/ftp/NDB/coordinates/na-biol/$line');
if ($response->is_success) {
$content = $response->content();
open(OUT,">$line.pdb") or die "Output file $line cannot be produced... Error...";
print (OUT "$content");
}
}
There are a number of problems with your program. The major ones being in this line
response = $ua->get('www.ndbserver.rutgers.edu/files/ftp/NDB/coordinates/na-biol/$line');
You are trying to assign to response, which is not a variable name
the value of $line is not being inserted into the URL because you are using single quotes
The contents of $line end with a linefeed, which should be removed using chomp
The URL has no scheme — it should start with http://
Apart from those points, you should fix these issues
You must always use strict and use warnings at the top of every Perl program you write. Adding -w on the shebang line is far inferior
You should use rather than require LWP::UserAgent. And there is no need to also use HTTP::Response as it is loaded as part of LWP
You should always use the three-parameter form of open with lexical file handles. And if the open fails you should print a die string that includes the value of $! which gives the reason for the failure
You should use while to read data from a file one line at a time, unless you have a good reason to need all of it in memory at once
There is no need to create a new user agent $ua every time around the loop. Just make one and use it to fetch every URL
You should use decoded_content instead of content to fetch the content of an HTTP::Response message in case it is compressed
Here is a program that includes all of those fixes. I haven't been able to test it but it does compile
#!/usr/bin/perl
use strict;
use warnings;
use LWP::UserAgent;
my $in_file = 'ndb_id_file.txt';
open my $fh, '<', $in_file or die qq{Unable to open "$in_file" for input: $!};
my $ua = LWP::UserAgent->new;
$ua->env_proxy('http');
$ua->proxy(['http', 'ftp'], 'http://144020019:*******#netmon.****.ac.in:80');
while ( my $line = <$fh> ) {
chomp $line;
my $url = "http://www.ndbserver.rutgers.edu/files/ftp/NDB/coordinates/na-biol/$line";
my $response = $ua->get($url);
unless ( $response->is_success ) {
warn $response->status_line;
next;
}
my $content = $response->decoded_content;
my $out_file = "$line.pdb";
open my $out_fh, '>', $out_file or die qq{Unable to open "$out_file" for output: $!};
print $out_fh $content;
close $out_fh or die qq{Unable to close "$out_file": $!};
}
I want to offer my visitors a file for download to their local machine (e.g. the Download directory in case of Windows7).
The code below works perfectly well, but only if the file is located on the same machine as the script:
#!/usr/bin/perl
my $path = "samples/10000.mp3"; ##PATH_TO_FILE
my $file = "10000.mp3";
print "Content-Type:application/octet-stream; name=\"$file\"\r\n";
print "Content-Disposition: attachment; filename=\"$file\"\r\n\n";
open( FILE, $path );
while(read(FILE, $buffer, 100) ){
print("$buffer");
}
The problem is that the file in question is located on another machine, so I have to get the url for download. I thought the coding below would do the trick, but no matter what I try, I end up with a downloaded file of 0 bytes. Can someone please tell me what I am doing wrong?
#!/usr/bin/perl
use LWP::Simple;
my $url = 'http://<sampleurl>.com';
my $file = '10000.mp3';
my $path = get($url);
print "Content-Type:application/octet-stream; name=\"$file\"\r\n";
print "Content-Disposition: attachment; filename=\"$file\"\r\n\n";
open my $fh, '+>', $path;
while(read($fh, $buffer, 100) ){
print("$buffer");
}
The get method in LWP::Simple returns the content, not the path to a file containing the content.
Once you have the content bits, write them to the standard output along with the header. Change your second program to
#! /usr/bin/perl
use LWP::Simple;
my $url = 'http://<sampleurl>.com';
my $file = '10000.mp3';
my $bits = get($url);
die "$0: get $url failed" unless defined $bits;
binmode STDOUT or die "$0: binmode: $!";
print qq[Content-Type:application/octet-stream; name="$file"\r\n],
qq[Content-Disposition: attachment; filename="$file"\r\n],
qq[\r\n],
$bits;
I am using the following code to try to print a variable to file.
my $filename = "test/test.csv";
open FILE, "<$filename";
my $xml = get "http://someurl.com";
print $xml;
print FILE $xml;
close FILE;
So print $xml prints the correct output to the screen. But print FILE $xml doesn't do anything.
Why does the printing to file line not work? Perl seems to often have these things that just don't work...
For the print to file line to work, is it necessary that the file already exists?
The < opens a file for reading. Use > to open a file for writing (or >> to append).
It is also worthwhile adding some error handling:
use strict;
use warnings;
use LWP::Simple;
my $filename = "test/test.csv";
open my $fh, ">", $filename or die("Could not open file. $!");
my $xml = get "http://example.com";
print $xml;
print $fh $xml;
close $fh;
Im using LWP to download an executable file type and with the response in memory, i am able to hash the file. However how can i save this file on my system? I think i'm on the wrong track with what i'm trying below. The download is successful as i am able to generate the hash correctly (I've double checked it by downloading the actual file and comparing the hashes).
use strict;
use warnings;
use LWP::Useragent;
use Digest::MD5 qw( md5_hex );
use Digest::MD5::File qw( file_md5_hex );
use File::Fetch;
my $url = 'http://www.karenware.com/progs/pthasher-setup.exe';
my $filename = $url;
$filename =~ m/.*\/(.*)$/;
$filename = $1;
my $dir ='/download/two';
print "$filename\n";
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url);
die $response->status_line if !$response->is_success;
my $file = $response->decoded_content( charset => 'none' );
my $md5_hex = md5_hex($file);
print "$md5_hex\n";
my $save = "Downloaded/$filename";
unless(open SAVE, '>>'.$save) {
die "\nCannot create save file '$save'\n";
}
print SAVE $file;
close SAVE;
If you are wondering why do i not instead download everything then parse the folder for each file and hash, its because im downloading all these files in a loop. And during each loop, i upload the relevant source URL (where this file was found) , along with the file name and hash into a database at one go.
Try getstore() from LWP::Simple
use strict;
use warnings;
use LWP::Simple qw(getstore);
use LWP::UserAgent;
use Digest::MD5 qw( md5_hex );
use Digest::MD5::File qw( file_md5_hex );
use File::Fetch;
my $url = 'http://www.karenware.com/progs/pthasher-setup.exe';
my $filename = $url;
$filename =~ m/.*\/(.*)$/;
$filename = $1;
my $dir ='/download/two';
print "$filename\n";
my $ua = LWP::UserAgent->new();
my $response = $ua->get($url);
die $response->status_line if !$response->is_success;
my $file = $response->decoded_content( charset => 'none' );
my $md5_hex = md5_hex($file);
print "$md5_hex\n";
my $save = "Downloaded/$filename";
getstore($url,$save);
getstore is an excellent solution, however for anyone else reading this response in a slightly different setup, it may not solve the issue.
First of all, you could quite possibly just be suffering from a binary/text issue.
I'd change
my $save = "Downloaded/$filename";
unless(open SAVE, '>>'.$save) {
die "\nCannot create save file '$save'\n";
}
print SAVE $file;
close SAVE;
to
my $save = "Downloaded/$filename";
open my $fh, '>>', $save or die "\nCannot create save file '$save' because $!\n";
# on platforms where this matters
# (like Windows) this is needed for
# 'binary' files:
binmode $fh;
print $fh $file;
close $fh;
The reason I like this better is that if you have set or acquired some settings on your browser object ($ua), they are ignored in LWP::Simple's getstore, as it uses its own browser.
Also, it uses the three parameter version of open which should be safer.
Another solution would be to use the callback method and store the file while you are downloading it, if for example you are dealing with a large file. The hashing algorithm would have to be changed so it is probably not relevant here but here's a sample:
my $req = HTTP::Request->new(GET => $uri);
open(my $fh, '>', $filename) or die "Could not write to '$filename': $!";
binmode $fh;
$res = $ua->request($req, sub {
my ($data, $response, $protocol) = #_;
print $fh $data;
});
close $fh;
And if the size is unimportant (and the hashing is done some other way) you could just ask your browser to store it directly:
my $req = HTTP::Request->new(GET => $uri);
$res = $ua->request($req, $filename);
i am using getpdftext.pl from CAM::PDF to extract pdf and print it to text, but in my web application i want to call this getpdftext.pl inside .cgi script. Can u suggest me as to what to do or how to proceed ahead. i tried converting getpdftext.pl to getpdftext.cgi but it doesnt work.
Thanks all
this is a extract from my request_admin.cgi script
my $filename = $q->param('quote');
:
:
:
&parsePdf($filename);
#function to extract text from pdf ,save it in a text file and parse the required fields
sub parsePdf($)
{
my $i;
print $_[0];
$filein = "quote_uploads/$_[0]";
$fileout = 'output.txt';
print "inside parsePdf\n";
open OUT, ">$fileout" or die "error: $!";
open IN, '-|', "getpdftext.pl $filein" or die "error :$!" ;
while(<IN>)
{
print "$i";
$i++;
print OUT;
}
}
It's highly likely that
Your CGI script's environment isn't complete enough to locate
getpdftext.pl and/or
The web-server user doesn't have permission to execute it anyway
Have a look in your web-server's error-log and see if it is reporting any pointers as to why this doesn't work.
In your particular case, it might be simpler and more direct to use CAM::PDF directly, which should have been installed along with getpdftext.pl anyway.
I had a look at this script and I think that your parsePdf sub could just as easily be written as:
#!/usr/bin/perl
use warnings;
use strict;
use CAM::PDF;
sub parsePdf {
my $filein = "quote_uploads/$_[0]";
my $fileout = 'output.txt';
open my $out_fh, ">$fileout" or die "error: $!";
my $doc = CAM::PDF->new($filein) || die "$CAM::PDF::errstr\n";
my $i = 0;
foreach my $p ($doc->rangeToArray(1,$doc->numPages()))
{
my $str = $doc->getPageText($p);
if (defined $str)
{
CAM::PDF->asciify(\$str);
print $i++;
print $out_fh $str;
}
}
}