Perl / Unix download pdf from website - perl

post #2 :) don't worry I don't intend to count them all...
Is there an easy way to download a pdf file from a website using a perl or shell script?
If I have an url as such:
http://www.cs.middlebury.edu/~briggs/Courses/CS201-F12/js/js.pdf
Actually, i will have a cron job that will be running daily to download a pdf file from a website
any help?
Thanks

Look at wget or curl. Example: wget <URL> -O <output file>

The LWP set of modules has a cut-down version LWP::Simple which allows this sort of thing to be done very simply.
use strict;
use warnings;
use LWP::Simple 'getstore';
my $resp = getstore('http://www.cs.middlebury.edu/~briggs/Courses/CS201-F12/js/js.pdf', 'js.pdf');
print $resp, "\n";
The value of $resp is the HTTP status code and should normally be 200 for a successful operation.

Related

Unable to get page via HTTPS with LWP::Simple in Perl

I try to download a page from an HTTPS URL with Perl:
use LWP::Simple;
my $url = 'https://www.ferc.gov/xml/whats-new.xml';
my $content = get $url or die "Unable to get $url\n";
print $content;
There seems to be a problem. Just can't figure out the error. I can't get the page. Is the get request improperly coded? Do I need to use a user agent?
LWP::Protocol::https is needed to make HTTPS requests with LWP. It needs to be installed separately from the rest of LWP. It looks like you installed LWP, but not LWP::Protocol::https, so simply install it now.

I could not download specific page via perl get, bash command GET and wget

I have an issue with downloading a page,
my $url='http://www.ncbi.nlm.nih.gov/nuccore?linkname=pcassay_nucleotide&from_aid=504934,1806,1805,1674';
I can browse following with a browser but when I run bash command in perl or linux shell,
GET $url >OUTPUT1; # Even it does not write anything to file "OUPUT1"
When I try wget, It downloads but not correct ,I mean with --> <title>Error - Nucleotide - NCBI</title>. I want the page with items , but it returns me a page without items.
my $html = qx{wget --quiet --output-document=OUTPUT1 $url};
**Note: I noticed a few minutes ago, url is ok with Mozilla firefox, but it can not be browsed via google chrome. it is weird, probably my issue related with this too. Any idea?
Code from link:
my $url='http://www.ncbi.nlm.nih.gov/nuccore?linkname=pcassay_nucleotide&from_aid=504934,1806,1805,1674';
my $html = qx{wget --quiet --output-document=OUTPUT11 $url};
# wget get something, but it does not get items, it gets what I get via google chrome
`GET $url2 >OUTPUT11`; # it does not write anything to file,
OK, given your code - the problem is almost certainly one of interpolation. Because the & in your URL is going to be interpreted by the shell you're spawning as 'background this process'.
That's almost certainly not what you want. Why not just use LWP natively?
#!/usr/bin/perl
use strict;
use warnings;
use LWP::Simple;
my $url='http://www.ncbi.nlm.nih.gov/nuccore?linkname=pcassay_nucleotide&from_aid=504934,1806,1805,1674';
my $content = get $url;
print $content;
open ( my $output_fh, '>', 'output.html' ) or die $!;
print {$output_fh} $content;
close ( $output_fh );

Why isn't this perl cgi script redirecting?

I have a perl cgi script that is exactly the following:
#!/usr/bin/perl -w
use CGI;
$query = new CGI;
print $query->redirect("http://www.yahoo.com");
At the command line things look OK:
$perl test.pl
Status: 302 Moved
Location: http://www.yahoo.com
When I load it in the browser, http://localhost/cgi-bin/test.pl, the request gets aborted, and depending on the browser I get various messages:
Connection reset by server.
No data received.
The only research I could find on this issue, stated that a common problem is printing some data or header before the redirect call, but I am clearly not doing that here.
I'm hosting it from a QNX box with the default slinger server.
The code works fine on my machine, check the following
Check the error logs, eg: tail /var/log/http/error_log
Do the chmod/chown permissions match other working CGi scripts, compare using ls -l
Does printing the standard hello world work? Change your print statement to
print $query->header(), 'Hello World';
Add the following for better errors
use warnings;
use diagnostics;
use CGI::Carp 'fatalsToBrowser';
at the command line use slinger will return some basic use options. For logging you need both syslogd and -d enabled in slinger. Ie
slinger -d &
Then look to /var/log/syslog for errors

How to execute one perl script from website in perl?

I am trying to run perl script that doing some things and creating files from web browser page in perl. I am using Windows 7.
This is source:
use CGI;
use warnings;
use strict;
print "Content-type:text/html; charset=utf-8\r\n\r\n";
print "<a href='./#'>START</a>";
system("C:\Perl\bin\perl C:\xampp\htdocs\xampp\bc\create_yaml.pl");
When I load this page it'll open cmd, but file what I want to run won't create any files. How can i find out that the script run or not? And how to run this script?
I try to change permission to file that I want to run but still it doesn't work.
Thanks for answers.
I will try to do simple example. But it doesnt create any file... hmmm whats wrong?
use CGI;
use strict;
use warnings;
print "Content-Type: text/html; charset=utf-8\n\n";
system("C:\\Perl\\bin\\perl C:\\xampp\\htdocs\\xampp\\vyber\\bc\\test\\create.pl");
source of create.pl:
open(INFO,">aaaaaaa.txt");
print INFO "voda";
close INFO;
I think your issue is that Windows uses \ for path names, but when you put it in quotes, you need to escape it, because it's a special character. You escape with \:
system("C:\\Perl\\bin\\perl C:\\xampp\\htdocs\\xampp\\bc\\create_yaml.pl");
Also, if your environmental path variables are set up correctly, you can just do this:
system("perl C:\\xampp\\htdocs\\xampp\\bc\\create_yaml.pl");
Or as amon pointed out, you can use forward slashes instead:
system("C:/Perl/bin/perl C:/xampp/htdocs/xampp/bc/create_yaml.pl");

Why doesn't my Perl CGI script work?

I really do not get how to run a Perl file. I have uploaded my .pl to the cgi-bin then chmod to 755. Then when i go to run the file i just get a 500 internal server error.
**/cgi-bin/helloworld.pl**
#!/usr/bin/perl
print 'hello world';
Any ideas as to what I am doing wrong?
Read the official Perl CGI FAQ.
That'll answer this, and many other questions you may have.
For example: "My CGI script runs from the command line but not the browser. (500 Server Error)"
Hope this helps!
You probably need something like
print "Content-type: text/html\n\n";
before your print statement. Take a look at http://httpd.apache.org/docs/2.0/howto/cgi.html#troubleshoot
It would help to know what server you are using, and the exact error message that's showing up in the server's logs. I'd guess that, if you are using Apache, you'll see something like "Premature end of script headers".
Look into using CGI::Carp to output fatal errors to the browser. use CGI::Carp qw(fatalsToBrowser);
Also, please definitely do use the CGI module to output any needed information such as headers/html/whatever. Printing it all is the wrong way to do it.
EDIT: You will also definitely be able to check an error log of some sort.
Perhaps you need my Troubleshooting Perl CGI scripts
First, find out the path to perl on that system and make sure the shebang line is correct. Giving more information about the system and the web server would also help others diagnose.
Then, try:
#!/path/to/perl/binary
use strict;
use warnings;
$| = 1;
use CGI qw( :default );
print header('text/plain'), "Hello World\n";
Make sure that you can run the script from a shell prompt, without invoking it through Perl. In other words, you should be able to go to your cgi-bin directory and type:
./helloworld.pl
and get output. If that doesn't work, fix that. In looking at the output, the first line must be:
Content-Type: text/html
(Or text/plain or some other valid MIME type.)
If that's not the case, fix that.
Then you must have an empty line before the body of your page is printed. If there's no empty line, your script won't work as a CGI script. So your total output should look like this:
Content-Type: text/html
hello world
If you can run your script and that's the output, then there's something weird going on. If Apache is not logging the error to an error_log file somewhere, then maybe there's some problem with it.
Did you enable Apache to server .pl files as CGI scripts? Check your Apache config file, or (quick but not guaranteed test) try changing the file extension to .cgi. Also, make sure your shebang line (#!) is at the very top. Finally, check the line endings are Unix if your server is Linux. And yes, test it from the command-line, and use strict; for better feedback on potential errors.