Unable to get page via HTTPS with LWP::Simple in Perl - perl

I try to download a page from an HTTPS URL with Perl:
use LWP::Simple;
my $url = 'https://www.ferc.gov/xml/whats-new.xml';
my $content = get $url or die "Unable to get $url\n";
print $content;
There seems to be a problem. Just can't figure out the error. I can't get the page. Is the get request improperly coded? Do I need to use a user agent?

LWP::Protocol::https is needed to make HTTPS requests with LWP. It needs to be installed separately from the rest of LWP. It looks like you installed LWP, but not LWP::Protocol::https, so simply install it now.

Related

Perl url encoding using Curl

I am having an issue where I am using cURL inside a perl script to execute a http request. I believe my issue is related to special characters in the URL string but I cannot figure out how to make it work.
I can confirm that the URL is correct as I can run it from my browser.
My perl script is
#!/usr/bin/perl
use strict;
use warnings;
$url = "http://machine/callResync?start=2017-02-01 00.00.00.000&end=2017-02-01 23.23.999";
system "curl $url
It fails when it reaches the first whitespace. I tired to escape that using %20.
After that I put in %26 to escape the & but then I get another issue. I have tired a number of different combinations but it keeps failing.
Any idea's.
Use the URI module to correctly build a URL, and rather than shelling out to cURL you should use a Perl library like LWP::Simple to access the page
The disadvantage of LWP::Simple is that it may be too simple in that it provides no diagnostics if the transaction fails. If you find you need something more elaborate then you should look at
HTTP::Tiny,
LWP::UserAgent, or
Mojo::UserAgent.
If you need help with these then please ask
use strict;
use warnings 'all';
use URI;
use LWP::Simple 'get';
my $url = URI->new('http://machine/callResync');
$url->query_form(
start => '2017-02-01 00.00.00.000',
end => '2017-02-01 23.23.999',
);
my $content = get($url) or die "Failed to access URL";
Problem number 1: You used an invalid URL. Spaces can't appear in URLs.
my $url = "http://machine/callResync?start=2017-02-01%2000.00.00.000&end=2017-02-01%2023.23.999";
Problem number 2: Shell injection error. You didn't correctly form your shell command.
system('curl', $url);
or
use String::ShellQuote qw( shell_quote );
my $cmd = shell_quote('curl', $url);
system($cmd);

Perl LWP::Simple won't "get" a webpage when running from remote server

I'm trying to use Perl to scrape a publications list as follows:
use XML::XPath;
use XML::XPath::XMLParser;
use LWP::Simple;
my $url = "https://connects.catalyst.harvard.edu/Profiles/profile/xxxxxxx/xxxxxx.rdf";
my $content = get($url);
die "Couldn't get publications!" unless defined $content;
When I run it on my local (Windows 7) machine it works fine. When I try to run it on the linux server where we are hosting some websites, it dies. I installed XML and LWP using cpan so those should be there. I'm wondering if the problem could be some sort of security or permissions on the server (keeping it from accessing an external website), but I don't even know where to start with that. Any ideas?
Turns out I didn't have LWP::Protocol::https" installed. I found this out by switching
LWP::Simple
to
LWP::UserAgent
and adding the following:
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('https://connects.catalyst.harvard.edu/Profiles/profile/xxxxxx/xxxxxxx.rdf' );
print $resp;
It then returned an error telling me it didn't have the protocol to access the https without LWP::Protocol::https, so I installed it with
cpan LWP::Protocol::https
and all was good.

Error while using WWW::Mechanize in Perl to open a Webpage

I am trying to open a web page in Perl using WWW::Mechanize module. The code for the same is as follows:
use WWW::Mechanize;
my $m = WWW::Mechanize->new();
$url = 'http://www.google.com';
$m->get($url);
print "$m->content()";
When i run this code i get an error like this:
Error GETing http://www.google.com :Can't connect to www.google.com:80.
What could be the reason for such an error and how can I change my code so that it opens up the webpage specified in the URL.?
There's 2 problems :
the line print "$m->content()"; should be written print $m->content(); : you will get WWW::Mechanize=HASH(0xeca870)->content() otherwise.
it seems that you have a network or software problem: the rest of your code works.

Perl mechanize response is only "<HTML></HTML>" with https

I'm kind of new with perl, even newer to Mechanize. So far, when I tried to fetch a site via http, it's no problem.
Now I need to fetch a site with https. I've installed Crypt::SSLeay via PPM.
When I use $mech->get($url), though, this is the only response I get:
"<HTML></HTML>"
I checked the status and success, both were OK (200 and 1).
Here's my code:
use strict;
use warnings;
use WWW::Mechanize;
use Crypt::SSLeay;
$ENV{HTTPS_PROXY} = 'http://username:pw#host:port';
//I have the https_proxy env variable set globally too.
my $url = 'https://google.com';
//Every https site has the same response,
//so I don't think google would cause problems.
my $mech = WWW::Mechanize->new(noproxy => 0);
$mech->get($url) or die "Couldn't load page";
print "Content:\n".$mech->response()->content()."\n\n";
As you can see I'm behind a proxy. I tried setting
$mech->proxy($myproxy);
but for no avail. I even tried to fetch it into a file, but when I checked it, I got the same response content.
Any kind of advice would be appreciated, since I'm just a beginner and there is still a lot to learn of everything. Thanks!
I think the answer lies here: How do I force LWP to use Crypt::SSLeay for HTTPS requests?
use Net::SSL (); # From Crypt-SSLeay
BEGIN {
$Net::HTTPS::SSL_SOCKET_CLASS = "Net::SSL"; # Force use of Net::SSL
$ENV{HTTPS_PROXY} = 'http://10.0.3.1:3128'; #your proxy!
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;
}

WWW::Mechanize timeout - all urls timing out

I am having a problem using WWW::Mechanize. It seems no matter what website I try accessing, my script just sits there in the command prompt until it times out. The only things that come to mind that might be relevant are the following:
I have IE7, chrome, and FF installed. FF was my default browser but I recently switched that to chrome.
I seem to be able to access websites with port 8080 just fine.
I recently experimented with the cookie jar but stopped using it because, honestly, I'm not sure how it works. This may have instantiated a change.
Here is an example:
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
my $url = 'http://docstore.mik.ua/orelly/perl/learn/';
my $mech = WWW::Mechanize->new();
$mech->get( $url );
print $mech->content;
The code seems to work, so it must be a firewall/proxy issue. You can try setting a proxy:
$mech->proxy(['http', 'ftp'], 'http://your-proxy:8080/');