Read a web page with Perl - perl

I am trying to read the content of a web page with perl on Windows 10. The code does not work for the following site:
https://www.dividendinvestor.com/dividend-quote/intc/
Here is the code I am using:
use LWP::Simple qw(get);
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
Any idea why I cannot read that page?

LWP::Simple is pretty basic and doesn't let you do anything clever like actually looking at the details of the response. So let's change to LWP::UserAgent and see what the response is.
use LWP::UserAgent;
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
my $resp = $ua->get($url);
print $resp->status_line;
This prints:
403 Forbidden
So I think that Quentin's comment is correct and that the site's owners are blocking people who use technology like LWP.
So let's change the useragent string to look like Internet Explorer.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
print $resp->status_line;
Now I get:
200 OK
So we should be ok to get the content.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
if ($resp->is_success) {
print $resp->content;
} else {
print $resp->status_line;
}
And that seems to work fine.
Note: Of course, changing the useragent string like this is rather dishonest. Presumably, the site's owners have a good reason for wanting to dissuade people from accessing their site in this way. So don't annoy them by trying to get around their restrictions. Read the site'sterms of service to see what they want to to do. Perhaps they have an API available that will give you the data you want.

As Dave Cross wrote, the problem is related to the user agent. It is possible to use the LWP::Simple module in this way:
use LWP::Simple qw/$ua get/;
$ua->agent('Mozilla/5.0');
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
As the documentation points, the user agent created by this module (LWP::Simple) will identify itself as "LWP::Simple/#.##". So we can change it before the "GET" request.

Related

Using KeyForge API with Perl

I'm trying to call the KeyForge API with a simple Perl program but it doesn't work. I'm using what's in the LWP::UserAgent documentation:
use strict;
use warnings;
use LWP::UserAgent ();
my $ua = LWP::UserAgent->new;
my $response = $ua->get('https://www.keyforgegame.com/api/decks/');
if ($response->is_success) {
print $response->decoded_content;
}
else {
die $response->status_line;
}
The program prints:
500 write failed: at test.pl line 16.
If I use the URL https://www.google.com or http://www.example.com, it works. The HTML is correctly displayed.
If I use this simple PowerShell program, it works too:
$Url = "https://www.keyforgegame.com/api/decks/"
$decks = Invoke-RestMethod ($url)
$decks
It displays:
count data
743719 {#{name=Dr. "The Old" Jeffries; expansion=341; power_level=0; chains=0; wins=0; losses=0; id=ec86db52-e41e-4e...
What am I missing?
PS: I'm using Perl 5.16.3 on Windows 10.
EDIT:
Thank you all for your help. I finally found out what was happening. It turns out I had a very old version of Net::HTTP (from 2013). I upgraded it and now it works out of the box, without configuring agent, cookies or e-mail. The error message I had was actually from the client and not from the server.
$ perl -MLWP::UserAgent -e'
my $ua = LWP::UserAgent->new();
my $response = $ua->get("https://www.keyforgegame.com/api/decks/");
print $response->as_string;
'
HTTP/1.1 403 Forbidden
...
Content-Type: text/html; charset=UTF-8
...
<!DOCTYPE html>
...
<title>Access denied | www.keyforgegame.com used Cloudflare to restrict access</title>
...
<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.keyforgegame.com) has banned your access based on your browser's signature (4bfe0c0e2e86ab84-ua22).</p>
...
But,
$ perl -MLWP::UserAgent -e'
use version; our $VERSION = qv("v1.0.0");
my $ua = LWP::UserAgent->new(
agent => "NameOfTool/$VERSION",
from => q{me#example.com},
);
my $response = $ua->get("https://www.keyforgegame.com/api/decks/");
print $response->as_string;
'
HTTP/1.1 200 OK
...
Content-Type: application/json
...
{"count":...
If they want to block you, they can. So it's your best interest to provide a unique application name, a proper version and a valid email address (even if providing junk for the agent and leaving out from field works). This gives them more options to resolve any issues they have with your program.

WWW::Mechanize send custom HTTP headers

Hi im making a little program for open a webpage this webpage needs to receive my msisdn for allow me to receive the login, im trying to send it by this way
#!/usr/bin/perl
use WWW::Mechanize;
my $target = "http://www.example.domain/subscription/showsubscribe";
my $user_agent = 'Mozilla/5.0 (Linux; Android 4.2.2; es-us; SAMSUNG GT-I9195L Build/JDQ39) AppleWebKit/535.19 (KHTML, like Gecko) Version/1.0 Chrome/18.0.1025.308 Mobile Safari/535.19';
my $phonenumber = 'XXXXXXXXXX';
my $mech = WWW::Mechanize->new(agent=>$user_agent);
$mech->add_header('x-msisdn'=> $phonenumber);
my $response = $mech->get($target);
die "Error at '$target'\n", $response->status_line, "\n
Aborting" unless $response->is_success;
print $mech->content;
$response = $mech->response;
for my $key ($response->header_field_names()) {
print "response[$key] = ", $response->header($key), "\n";
the X-msisdn variable i got reading in the forum from this page: http://mobiforge.com/design-development/useful-x-headers
any idea of how i can send the HTTP header?
thanks in advance!
... $mech->add_header('x-msisdn'=> $phonenumber);
any idea of how i can send the HTTP header?
The header gets sent (check with wireshark or similar).
Your problem is something different.

Cannot login with UserAgent

I have managed to login with the code below. Now I can do it ony once a day.
And then I cant login, but get the login page in the response.
But when i print $reqstr from the code below and paste it to browser(like firefox), I can log in.
Wget doesnt work neiter. Only normal browser.
Soemtimes it seems , that Im logged in, but only get such content:
"<html>\cJ<head>\cJ\cI<meta http-equiv=\"content-type\" content=\"text/html; charset=ISO-8859-1\"><meta http-equiv=\"expires\" content=\"0\"><meta http-equiv=\"pragma\" content=\"no-cache\">\cJ\cI<meta http-equiv=\"refresh\" content=\"0; URL='https://www.address.com/'\">\cJ</head>\cJ</html>\cJ"
I also noticed, that while I cant login, Im getting this part in a debugger:
_uri_canonical' => URI::https=SCALAR(0x17dad28)
-> REUSED_ADDRESS
'handlers' => HASH(0x22dc0c0)
'response_data' => ARRAY(0x22ee8b8)
0 HASH(0x22d9a48)
'callback' => CODE(0x22dba30)
-> &LWP::UserAgent::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:682] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/UserAgent.pm:679-682
1 HASH(0x22eea08)
'callback' => CODE(0x22d9cb8)
-> &LWP::Protocol::__ANON__[/usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:138] in /usr/lib/perl5/vendor_perl/5.10.0/LWP/Protocol.pm:135-138
Any clue?
Here the code:
my $b = LWP::UserAgent->new(agent => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.5) Gecko/20060719 Firefox/31.2.0',);
my $cookie_jar = HTTP::Cookies->new(
file => 'lwp_cookies.txt',
autosave => 1,
ignore_discard => 1,
);
$cookie_jar->clear;
$cookie_jar->clear_temporary_cookies;
$b->cookie_jar($cookie_jar);
my $url = "https://www.address.com";
my $r = $b->get($url);
$r->decoded_content =~ /FORM ACTION="(.*?)" METHOD/msgi;
my $a = "$url$1";
print $a."\n";
my $reqstr = $a."&LoginAction=Login&Number=55555&KPassword=passw&UserID=uid";
my $req = HTTP::Request->new(POST => $reqstr);
$req->header('Host', 'www.address.com');
$req->header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0');
$req->header('Connection', 'keep-alive');
$req->header('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8');
my $c = $b->request($req);
You need to re-request that page with the referrer added via referer() for LWP::UserAgent (or see my second answer if you aren't wedded to that module)
sub login { # Code not tested and not really compilable, just a stub for you
my (#other_args, $url, $referrer_url) = #_;
# Add your login code from the question, up to calling $b->request()
$req->referer($referrer_url) if $referrer_url;
my $c = $b->request($req);
return $c; # Or return the response?
}
my $result1 = login($original_login_url); #first try
# Obtain the redirect_url from the response.
# If it was a 301 redirect, you can do it via
# my #redirects = $response->redirects();
my $referrer_url = $original_login_url;
my $result2 = login($redirect_url, $referrer_url);
References:
http://forums.devshed.com/perl-programming-6/lwp-meta-refresh-tag-handling-63484.html
http://www.herongyang.com/Perl/LWP-UserAgent-Follow-HTTP-Redirects.html
If you aren't dead set on using LWP::UserAgent, use WWW::Mechanize instead.
Best approach: use WWW::Mechanize::Plugin::FollowMetaRedirect. The SYNOPSIS is pretty short and to the point:
use WWW::Mechanize;
use WWW::Mechanize::Plugin::FollowMetaRedirect;
my $mech = WWW::Mechanize->new;
$mech->get( $url );
$mech->follow_meta_redirect;
# Optionally, skip emulating the waiting time
$mech->follow_meta_redirect( ignore_wait => 1 );
If you don't have access to that module, you can create your own, similar to this: http://www.perlmonks.org/?node_id=487286
(Basically, parse the returned content using the regex shudder to extract the refresh URL, and get that URL. As per my other answer, you might need to add the referrer header)

how to send a http patch request with Lwp::Useragent?

I am working against the salesforce rest api with lwp::useragent.
I have to use the http patch request.
For get and post requests we get use the following code:
require LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
my $get_response = $ua->get('http://search.cpan.org/',x=>'y');
my $post_response = $ua->post('http://search.cpan.org/',x=>'y');
Unfortunately this does not work
my $patch_response = $ua->patch('http://search.cpan.org/',x=>'y');
I don't find how to do it with this module.
There is a workaround to this problem like explained here How do I send a request using the PATCH method for a Salesforce update?
This works but this is not a nice solution.
I saw that with python it is possible to make explicitly patch requests How do I make a PATCH request in Python? so i assume that there is also an option with perl.
my $request = HTTP::Request->new(PATCH => $url);
... Add any necessary headers and body ...
my $response = $ua->request($request);
This has recently got a whole lot easier. PATCH is now implemented (like POST) in HTTP::Message.
First, update the HTTP::Message module (to 6.13 or later).
Then
my %fields = ( title => 'something', body => something else');
my $ua = LWP::UserAgent->new();
my $request = HTTP::Request::Common::PATCH( $url, [ %fields ] );
my $response = $ua->request($request);

Perl HTTP request : POST fails while GET succeeds

When I try to submit a POST request with Perl, it often ends in a 301 redirect to the homepage. Here is the code :
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
# This does not work
my $url = 'http://www.opensubtitles.org/en/search2';
my $req = HTTP::Request->new(POST => $url);
$req->content('MovieName=the+terminator+(1996)');
# Pass request to the user agent and get a response back
print $req->as_string."\n";;
my $res = $ua->request($req);
if (!$res->is_success) {
print $res->status_line, "\n";
}
else {
print "Success in posting search\n";
}
In order to make it work, I have to manually use Firefox, go to the url (!). Then the script works. However, using a GET request works flawlessly :
# This works
my $url = 'http://www.opensubtitles.org/en/search2?MovieName=the+terminator+(1996)';
my $req = HTTP::Request->new(GET => $url);
Why is that ?
The site doesn't expect a POST to that URL, so it redirects you to back to the search page.
Firefox will use GET, not POST, if you just put the URL into the address line, that's why it works.