WWW::Mechanize send custom HTTP headers - perl

Hi im making a little program for open a webpage this webpage needs to receive my msisdn for allow me to receive the login, im trying to send it by this way
#!/usr/bin/perl
use WWW::Mechanize;
my $target = "http://www.example.domain/subscription/showsubscribe";
my $user_agent = 'Mozilla/5.0 (Linux; Android 4.2.2; es-us; SAMSUNG GT-I9195L Build/JDQ39) AppleWebKit/535.19 (KHTML, like Gecko) Version/1.0 Chrome/18.0.1025.308 Mobile Safari/535.19';
my $phonenumber = 'XXXXXXXXXX';
my $mech = WWW::Mechanize->new(agent=>$user_agent);
$mech->add_header('x-msisdn'=> $phonenumber);
my $response = $mech->get($target);
die "Error at '$target'\n", $response->status_line, "\n
Aborting" unless $response->is_success;
print $mech->content;
$response = $mech->response;
for my $key ($response->header_field_names()) {
print "response[$key] = ", $response->header($key), "\n";
the X-msisdn variable i got reading in the forum from this page: http://mobiforge.com/design-development/useful-x-headers
any idea of how i can send the HTTP header?
thanks in advance!

... $mech->add_header('x-msisdn'=> $phonenumber);
any idea of how i can send the HTTP header?
The header gets sent (check with wireshark or similar).
Your problem is something different.

Related

Read a web page with Perl

I am trying to read the content of a web page with perl on Windows 10. The code does not work for the following site:
https://www.dividendinvestor.com/dividend-quote/intc/
Here is the code I am using:
use LWP::Simple qw(get);
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
Any idea why I cannot read that page?
LWP::Simple is pretty basic and doesn't let you do anything clever like actually looking at the details of the response. So let's change to LWP::UserAgent and see what the response is.
use LWP::UserAgent;
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
my $resp = $ua->get($url);
print $resp->status_line;
This prints:
403 Forbidden
So I think that Quentin's comment is correct and that the site's owners are blocking people who use technology like LWP.
So let's change the useragent string to look like Internet Explorer.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
print $resp->status_line;
Now I get:
200 OK
So we should be ok to get the content.
use LWP::UserAgent;
my $agent = ' Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko';
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $ua = LWP::UserAgent->new;
$ua->agent($agent);
my $resp = $ua->get($url);
if ($resp->is_success) {
print $resp->content;
} else {
print $resp->status_line;
}
And that seems to work fine.
Note: Of course, changing the useragent string like this is rather dishonest. Presumably, the site's owners have a good reason for wanting to dissuade people from accessing their site in this way. So don't annoy them by trying to get around their restrictions. Read the site'sterms of service to see what they want to to do. Perhaps they have an API available that will give you the data you want.
As Dave Cross wrote, the problem is related to the user agent. It is possible to use the LWP::Simple module in this way:
use LWP::Simple qw/$ua get/;
$ua->agent('Mozilla/5.0');
my $url = 'https://www.dividendinvestor.com/dividend-quote/intc/';
my $html = get $url;
print $html;
As the documentation points, the user agent created by this module (LWP::Simple) will identify itself as "LWP::Simple/#.##". So we can change it before the "GET" request.

Using KeyForge API with Perl

I'm trying to call the KeyForge API with a simple Perl program but it doesn't work. I'm using what's in the LWP::UserAgent documentation:
use strict;
use warnings;
use LWP::UserAgent ();
my $ua = LWP::UserAgent->new;
my $response = $ua->get('https://www.keyforgegame.com/api/decks/');
if ($response->is_success) {
print $response->decoded_content;
}
else {
die $response->status_line;
}
The program prints:
500 write failed: at test.pl line 16.
If I use the URL https://www.google.com or http://www.example.com, it works. The HTML is correctly displayed.
If I use this simple PowerShell program, it works too:
$Url = "https://www.keyforgegame.com/api/decks/"
$decks = Invoke-RestMethod ($url)
$decks
It displays:
count data
743719 {#{name=Dr. "The Old" Jeffries; expansion=341; power_level=0; chains=0; wins=0; losses=0; id=ec86db52-e41e-4e...
What am I missing?
PS: I'm using Perl 5.16.3 on Windows 10.
EDIT:
Thank you all for your help. I finally found out what was happening. It turns out I had a very old version of Net::HTTP (from 2013). I upgraded it and now it works out of the box, without configuring agent, cookies or e-mail. The error message I had was actually from the client and not from the server.
$ perl -MLWP::UserAgent -e'
my $ua = LWP::UserAgent->new();
my $response = $ua->get("https://www.keyforgegame.com/api/decks/");
print $response->as_string;
'
HTTP/1.1 403 Forbidden
...
Content-Type: text/html; charset=UTF-8
...
<!DOCTYPE html>
...
<title>Access denied | www.keyforgegame.com used Cloudflare to restrict access</title>
...
<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.keyforgegame.com) has banned your access based on your browser's signature (4bfe0c0e2e86ab84-ua22).</p>
...
But,
$ perl -MLWP::UserAgent -e'
use version; our $VERSION = qv("v1.0.0");
my $ua = LWP::UserAgent->new(
agent => "NameOfTool/$VERSION",
from => q{me#example.com},
);
my $response = $ua->get("https://www.keyforgegame.com/api/decks/");
print $response->as_string;
'
HTTP/1.1 200 OK
...
Content-Type: application/json
...
{"count":...
If they want to block you, they can. So it's your best interest to provide a unique application name, a proper version and a valid email address (even if providing junk for the agent and leaving out from field works). This gives them more options to resolve any issues they have with your program.

Can't use concurrent ascynrounous URLs with Net::Async::HTTP .. It quits and doesn't goto the next URL

Using the concurrent asynchronous URL example for Net::Async::HTTP, the first encounter of bad URL (timeout, doesn't exist, etc) error causes the program to fail and exit completely, without continuing to the next URL in the array. Is the problem my code or the module?
I tried setting fail_on_error to 0, and even 1, but it had no obvious results.
#!/bin/perl
use IO::Async::Loop;
use Net::Async::HTTP;
use Future::Utils qw(fmap_void);
use strict;
use warnings;
use feature 'say';
my $ua_string = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36";
my $timeout = 10;
my $max_redirects = 10;
my $max_in_flight = 10;
my $max_connections_per_host = 10;
my $stall_timeout = 10;
my $max_recurse = "10";
my $max_per_host = "10";
my #URLs = ( "http://cnn.com", "http://google.com", "http://sdfsdfsdf24.com", "http://msn.net" );
my $loop = IO::Async::Loop->new();
my $http = Net::Async::HTTP->new();
$loop->add($http);
my $future = fmap_void {
my ( $url ) = #_;
$http->configure(user_agent => $ua_string);
$http->configure(timeout => $timeout );
$http->configure(max_redirects => $max_redirects);
$http->configure(max_in_flight => $max_in_flight);
$http->configure(max_connections_per_host => $max_connections_per_host);
$http->configure(stall_timeout => $stall_timeout);
$http->configure(fail_on_error => '0' );
$http->GET($url)->on_done(
sub {
my $response = shift;
say "Response: $response->code";
}
)->on_fail(
sub {
my $fail = shift;
say "Failed: $fail";
}
);
}
foreach => \#URLs;
$loop->await($future);
Your example really works well without any proxy, I tested and did some changes:
Fetching URL: '. $url;
$http->GET($url)->on_done(
sub {
my $response = shift;
say "Response: ".$response->code();
}
)->on_fail(
sub {
my $fail = shift;
say "Failed: " . $fail;
}
);
Output:
Fetching URL: http://cnn.com
Response: 200
Fetching URL: http://google.com
Response: 302
Fetching URL: http://sdfsdfsdf24.com
Response: 403
Fetching URL: http://msn.net
Response: 200
As this example is not doing async call's the URL's are on a queue and being processed one by one.
Behind the scenes when you are doing a request to a target in your case to some URL's, at the low level the connection is made through a socket connection.
If you have a proxy which is not configured between your script and the intenet, there is no connection and it will raise an exception and your script will die like:
Fetching URL: http://cnn.com
Failed: Timed out
The variable $! is set and the error "Operation now in progress" appears, in fact your request didn't established any connection it just tried to establish one without success.
There are some points which you can check for example:
1 - Is the proxy working ?
2 - Do I have internet connection ?
3 - Is the URL I am testing working ?
If you are having problems with proxy, your script need a small adjust that you can get more info in the docs:
$http->configure( proxy_host => 'xx.xx.xx.xx');
$http->configure( proxy_port => 1234);
Supposing that your proxy is configured, you can check if you have fully access to the internet and aim some target like that URL's.
Trying to access the URLs it will provide you a response code and depending on the code you can do something.
As an alternative solution you could use LWP::UserAgent to make simple requests and check the response code.
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
$ua->timeout(10);
$ua->env_proxy;
my $response = $ua->get('http://search.cpan.org/');
if ($response->is_success) {
print $response->decoded_content; # or whatever
}
else {
die $response->status_line;
}
And even with some bad stats like 4XX for example Net::Async::HTTP won't be friendly to use this module for a simple purpose as it can't handle the exceptions like you want.

POST to a web page with request headers in Perl

I am trying to post data to a page with Perl, but the page also requires headers. How would I post the headers and send headers (cookies, user agents, etc)?
I tried using LWP::UserAgent, but I couldn't figure out how to send the headers even though I could post to the page.
One more thing about this topic. When I posted on that page and printed the response content I could see the html just fine except the numbers that were supposed to show.
Try doing this :
use LWP::UserAgent;
use HTTP::Request;
my $userAgent = LWP::UserAgent->new();
my $request = HTTP::Request->new(
POST => "http://domain.tld/path"
);
$request->content("stuff=foobar");
$request->content_type("application/x-www-form-urlencoded");
my $response = $userAgent->request($request);
if ($response->code == 200) {
print $response->as_string;
}
else {
die $response->status_line;
}

How can I log in to YouTube using Perl?

I am trying to write a Perl script to connect to me YouTube account but it doesnt seem to work. Basically I just want to connect to my account but apparently it is not working. I don't even have an idea on how I could debug this! Maybe it is something related to https protocol?
Please enlighten me! Thanks in advance.
use HTTP::Request::Common;
use LWP::UserAgent;
use strict;
my $login="test";
my $pass = "test";
my $res = "";
my $ua = "";
# Create user agent, make it look like FireFox and store cookies
$ua = LWP::UserAgent->new;
$ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20051213 Firefox/1.0.7");
$ua->cookie_jar ( {} );
# Request login page
$res = $ua->request(GET "https://www.google.com/accounts/ServiceLogin?service=youtube&hl=en_US&passive=true&ltmpl=sso&uilel=3&continue=http%3A//www.youtube.com/signup%3Fhl%3Den_US%26warned%3D%26nomobiletemp%3D1%26next%3D/index");
die("ERROR1: GET http://www.youtube.com/login\n") unless ($res->is_success);
# Now we login with our user/pass
$res = $ua->request(
POST "https://www.google.com/accounts/ServiceLoginAuth?service=youtube",
Referer => "http://www.youtube.com/login",
Content_Type => "application/x-www-form-urlencoded",
Content => [
currentform => "login",
next => "/index",
username => $login,
password => $pass,
action_login => "Log+In"
]
);
# YouTube redirects (302) to a new page when login is success
# and returns OK (200) if the login failed.
#die("ERROR: Login Failed\n") unless ($res->is_redirect());
print $res->content;
what i am doing is learning the web features of perl, so i dont want to use any library except wwwlib or mechanize to get the job done.
how can i just connect to my account using a perl script? this is my objective for now
hope someone can post a script or correct mine.
thank you guys for you help.
i am testing Webscarab now..
What data are you trying to grab? Why not just using an existing implementation like WebService::YouTube
Some comments on your code: I always avoided the shortcut $ua->request(GET/POST) method since I always ended up needing more flexibility that only the use of HTTP::Request and HTTP::Response allowed. I always felt the code was cleaner that way too.
Why is your code not working? Who knows.
Make sure your cookiejar is adding your cookies to the outgoing HTTP::Request. I'd suggest dumping all your headers when you do it in a browser and compare with the headers and data that libwww is sending. There may be some additional fields that they are checking for that vary for every hit. They may be checking for your UserAgent string. If you are just looking to learn libwww I'd suggest using a different site as a target as I'm sure YouTube has all sort of anti-scripting hardening.
Are you using YouTube's stable documented API?
Use an HTTP proxy such as WebScarab to watch the data flow.
Trey's suggestion to use somebody else's CPAN package for the mechanics is a good idea too.
Right right by and large, what you want to do is define a cookiejar for most of these websites that have a redirection login. This is what the package has done. Also the package tunes a lot of the lookups and scrapes based on the youtube spec.
Ajax content for example will be rough since its not there when your scraping
You just picked a somewhat rough page to start out with.
Enjoy
I'm actually working on this issue myself. Before, I would suggest read over this the API guide from Google as a good starting reference. If I'm reading it correctly, one begins with passing user credentials through a REST interface to get a Authentication Token. To handle that, I'm using the following:
sub getToken {
my %parms = #_;
my $response = LWP::UserAgent->new->post(
'https://www.google.com/youtube/accounts/ClientLogin',
[
Email => $parms{'username'},
Passwd => $parms{'password'},
service => "youtube",
source => "<<Your Value Here>>",
]
);
my $content = $response->content;
my ($auth) = $content =~ /^Auth=(.*)YouTubeUser(.*)$/msg
or die "Unable to authenticate?\n";
my ($user) = $content =~ /YouTubeUser=(.*)$/msg
or die "Could not extract user name from response string. ";
return ($auth, $user);
}
And I call that from the main part of my program as such:
## Get $AuthToken
my ($AuthToken, $GoogleUserName) = getToken((
username => $email, password => $password
));
Once I have these two things -- $AuthToken and $GoogleUserName, I'm still testing the LWP Post. I'm still writing this unit:
sub test {
my %parms = #_;
## Copy file contents. Use, foy's three param open method.
my $fileSize = -s $parms{'File'};
open(VideoFile, '<', "$parms{'File'}") or die "Can't open $parms{'File'}.";
binmode VideoFile;
read(VideoFile, my $fileContents, $fileSize) or die "Can't read $parms{'File'}";
close VideoFile;
my $r = LWP::UserAgent->new->post(
"http://uploads.gdata.youtube.com/feeds/api/users/$parms{'user'}/uploads",
[
Host => "uploads.gdata.youtube.com",
'Authorization' => "AuthSub token=\"$parms{'auth'}\"",
'GData-Version' => "2",
'X-GData-Key' => "key=$YouTubeDeveloperKey",
'Slug' => "$parms{'File'}",
'Content-Type' => "multipart/related; boundary=\"<boundary_string>\"",
'Content-Length' => "<content_length>",
'video_content_type'=> "video/wmv",
'Connection' => "close",
'Content' => $fileContents
]
);
print Dumper(\$r->content)
}
And that is called as
&test((auth=>$Auth, user=>$user, File=>'test.wmv'));