Perl Mechanize, making a script to login on a webpage - perl

I'm making a script to automatically login on a webpage, it starts with this:
use HTTP::Cookies;
use WWW::Mechanize;
my $cookie_jar = HTTP::Cookies->new;
my $agent = WWW::Mechanize->new( cookie_jar => $cookie_jar );
my $server_endpoint = "http://10.11.5.2/index.php";
$agent->post($server_endpoint,[tg => 'login', referer => 'index.php',login => 'login',sAuthType=>'LOL',nickname=>'admin',password=>'012345678',submit=>'Login']);
print "Set Cookie Jar?\n", $agent->cookie_jar->as_string, "\n";
print $agent->content;
And I get a page saying "you are not logged in"...but when I use the same credentials in the browser everything works.
So I retrieved the value of the cookie sent by the server (located in the set-cookie header of the response) with $agent->cookie_jar->as_string, here it's OV3176019645=3inkmpee0r5gpfm41c3iltvda1.
THen I put it in the POST request before sending it, like the following:
use HTTP::Cookies;
use WWW::Mechanize;
my $cookie_jar = HTTP::Cookies->new;
my $agent = WWW::Mechanize->new( cookie_jar => $cookie_jar );
my $server_endpoint = "http://10.11.5.2/index.php";
$agent->add_header( Cookie => 'OV3176019645=osovm5u0vfc2dmkuo6bqn6hah1' );
$agent->post($server_endpoint,[tg => 'login', referer => 'index.php',login => 'login',sAuthType=>'LOL',nickname=>'admin',password=>'012345678',submit=>'Login']);
print $agent->content;
This time everything works...
So, my question is: how can I automatically get the value of the cookie given by the server before I send my request ?
Another problem also appears, it's that the server sends back a cookie with the following shape (in the set-cookie header):
Set-Cookie3: OV3176019645=3inkmpee0r5gpfm41c3iltvda1; path="/"; domain=10.11.5.2; path_spec; discard; version=0
And I just need the 1st item of this cookie (OV3176019....).
I hope I was clear in my explanations.
Thanks

Related

Perl: Some websites block non-browser requests. But how?

I'm writing a simple Perl script that fetches some pages from different sites. It's very non-intrusive. I don't hog a servers bandwidth. It retrieves a single page without loading any extra javascript, or images, or style sheets.
I use LWP::UserAgent to retrieve the pages. This works fine on most sites but there are some sites that return a "403 - Bad Request" error. The same pages load perfectly fine in my browser. I have inspected the request header from my webbrowser and copied that exactly when trying to retrieve the same page in Perl and every single time I get a 403 error. Here's a code snippet:
use strict;
use LWP::UserAgent;
use HTTP::Cookies;
my $URL = "https://www.betsson.com/en/casino/jackpots";
my $browserObj = LWP::UserAgent->new(
ssl_opts => { verify_hostname => 0 }
);
# $browserObj->cookie_jar( {} );
my $cookie_jar = HTTP::Cookies->new();
$browserObj->cookie_jar( $cookie_jar );
$browserObj->agent( "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0");
$browserObj->timeout(600);
push #{ $browserObj->requests_redirectable }, 'POST';
my #header = ( 'Accept' => 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding' => 'gzip, deflate, br',
'Accept-Language' => 'en-US,en;q=0.5',
'Connection' => 'keep-alive',
'DNT' => '1',
'Host' => 'www.bettson.com',
'Upgrade-Insecure-Requests' => '1'
);
my $response = $browserObj->get( $URL, #header );
if( $response->is_success ) {
print "Success!\n";
} else {
print "Unsuccessfull...\n";
}
How do these servers distinguish between a real browser and my script? At first I thought they had some JavaScript trickery going on, but then I realized in order for that to work, the page has to be loaded by a browser first. But I immediately get this 403 Error.
What can I do to debug this?
While 403 is a typical answer for bot detection, in this case the bot detection is not the cause of the problem. Instead a typo in your code is:
my $URL = "https://www.betsson.com/en/casino/jackpots";
...
'Host' => 'www.bettson.com',
In the URL the domain name is www.betsson.com and this should be reflected in the Host header. But your Host header is slightly different: www.bettson.com. Since the Host header has the wrong name the request is rejected with 403 forbidden.
And actually, it is not even needed to go through all this trouble since it looks like no bot detection is done at all. I.e. no need to set user-agent and fiddle with the headers but plain:
my $browserObj = LWP::UserAgent->new();
my $response = $browserObj->get($URL);

I can't connect using an api

I'm quite new to API's so I don't know if this should be more straight forward.
I write the following perl script
use strict;
use LWP::UserAgent;
require HTTP::Request;
my $request = HTTP::Request->new(GET => 'http://api.elsevier.com/content/ev/results?apiKey=1234&query=stress&database=c&updateNumber=1&pageSize=1');
my $ua = LWP::UserAgent->new;
my $response = $ua->request($request);
then when I get my response and print it in the debugger I get the following
HTTP::Response=HASH(0x9aedff8)
'_content' => '{"service-error":{"status":{"statusCode":"AUTHENTICATION_ERROR","statusText":"Requestor configuration settings insufficient for access to this resource."}}}'
'_headers' => HTTP::Headers=HASH(0x9aedfe8)
'allow' => 'GET'
'client-date' => 'Wed, 29 Mar 2017 08:08:25 GMT'
'client-peer' => '198.185.19.118:80'
'client-response-num' => 1
'content-length' => 156
'content-type' => 'application/json;charset=UTF-8'
'date' => 'Wed, 29 Mar 2017 08:08:24 GMT'
'p3p' => 'CP="IDC DSP LAW ADM DEV TAI PSA PSD IVA IVD CON HIS TEL OUR DEL SAM OTR IND OTC"'
'server' => 'api.elsevier.com 9999'
'vary' => 'Origin'
'x-cnection' => 'close'
'x-els-apikey' => 'e688c9db4db0386581dbe4c4dda46164'
'x-els-reqid' => '0000015b190d89fe-a0d0'
'x-els-status' => 'AUTHENTICATION_ERROR(Requestor configuration settings insufficient for access to this resource.)'
'x-els-transid' => 'cbf787b4-d171-4e35-8237-8cab3c931205'
'x-re-ref' => '1 1490774904423414'
'_msg' => 'Forbidden'
'_protocol' => 'HTTP/1.1'
'_rc' => 403
'_request' => HTTP::Request=HASH(0x9fc3000)
'_content' => ''
'_headers' => HTTP::Headers=HASH(0x9ae73e0)
'user-agent' => 'libwww-perl/5.831'
'_method' => 'GET'
'_uri' => URI::http=SCALAR(0x9e25188)
-> 'http://api.elsevier.com/content/ev/results?apiKey=e688c9db4db0386581dbe4c4dda46164&query=stress&database=c&updateNumber=1&pageSize=1'
'_uri_canonical' => URI::http=SCALAR(0x9e25188)
-> REUSED_ADDRESS
one of the notable lines is
x-els-status' => 'AUTHENTICATION_ERROR(Requestor configuration settings insufficient for access to this resource.)'
I don't know how to get a proper response text. I tried searching their websites for examples, but I can't seem to get it. as well I'm not sure if the key is only for scopus but not engineering village which I'm trying to use.
There website is here. https://dev.elsevier.com/index.html?utm_expid=89327795-0.AtRZzToKQ2u1mZEyQ3n7OQ.0&utm_referrer=https%3A%2F%2Fdev.elsevier.com%2Ftecdoc_ev_retrieval_request.html
any help would be appreciated
To get the text out of your response, you need to call the $response->decoded_content method. That will give you the JSON string that you can see in _content in your debug output. I've indented it to make it easier to read.
{
"service-error" : {
"status" : {
"statusCode" : "AUTHENTICATION_ERROR",
"statusText" : "Requestor configuration settings insufficient for access to this resource."
}
}
}
You can use the JSON module to decode this into a Perl data structure.
use JSON 'from_json';
my $res = $ua->request($req);
my $json = from_json( $res->decoded_content );
The error message you get back clearly states that you are not authenticated properly. I've looked at this guide from the documentation you mentioned. It seems that the apiKey URL param works, if you have the right type of account. You should check with whoever made that account for you, or if that was you and you're not sure, the account manager at that service that is working with you. They'll tell you if you are using the right API key, and if this method of authentication works for you.
Since this API also offers to use a custom header X-ELS-APIKey: [apikey] for the authentication I would suggest using that. Your API key is a secret, and you shouldn't share it with anyone. It's like a password. If you put it into the URL, it might show up in log files. But as a header, it does usually not.
This is how you add a custom header to an HTTP request. Make sure you don't have the apiKey URL param any more if you do this.
my $req = HTTP::Request->new( GET => $url ); # no apiKey=123 here!
$req->header( 'X-ELS-APIKey' => 123 );
Now as a last step, you should check the HTTP response code of the response. A 200 (or most other codes that start with 2) means the request was successful. The 403 that you are getting back means unauthorized, which also hints at that you are not authenticated correctly.
Since it seems that this API returns JSON in both success and failure cases, you might need to decode it for both. If you care to examine the failure response, that makes sense. If not, you can skip that part. To do this, use $res->is_success, which is also used in the synopsis of the LWP::UserAgent documentation.
use strict;
use warnings;
use LWP::UserAgent;
use HTTP::Request;
use JSON 'from_json';
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new( GET => 'http://api.elsevier.com/content/ev/results?query=stress&database=c&updateNumber=1&pageSize=1' );
$req->header( 'X-ELS-APIKey' => 123 );
if ($req->is_success) {
my $json = from_json( $res->decoded_content );
# ... do stuff with the response
} else {
# something went wrong
}

Perl and LWP not authenticating

I'm trying to get an LWP request working to an https server. I have been given a user & pass, advised to use basic authentication. I've tried various chunks of code, and all seem to get an authentication error. My current code is...
use warnings;
use strict;
use Data::Dumper;
use LWP;
my $ua = LWP::UserAgent->new( keep_alive => 1 );
##also tried by $ua->credentials('domain','','user','pass');
##not sure if I need 'realm' or how I get it, as no popup on screen.
my $request = HTTP::Request->new( GET => "https://my.url.com/somepath/" );
$request->authorization_basic('myuser','mypass');
$request->header( 'Cache-Control' => 'no-cache' );
print $response->content;
print Dumper $response;
The server gives a security error, but if I look at a dump of $response, I see the following...
'_rc' => '401',
'_headers' => bless( { .... lots of stuff
'title' => 'Security Exception',
'client-warning' => 'Missing Authenticate header',
'client-ssl-socket-class' => 'IO::Socket::SSL',
...
'expires' => '-1'
}, 'HTTP::Headers' ),
'_msg' => 'Unauthorized',
'_request' => bless( {
'_content' => '',
'_uri' => bless( do{\(my $o = 'https:theurlabove')}, 'URI::https' ),
'_method' => 'GET',
'_uri_canonical' => $VAR1->{'_request'}{'_uri'}
'_headers' => bless( {
'user-agent' => 'libwww-perl/6.04',
'cache-control' => 'no-cache',
'authorization' => 'Basic dzx..........'
}, 'HTTP::Headers' ),
I'm trying to understand whats happening, it looks like in the original request, it has the headers in there, but in the response, its saying I'm 'Missing Authenticate Header'.
Is there something amiss with the code, or something I'm misunderstanding with the request/respinse ?
Thanks.
The "Missing Authenticate header" message is coming from LWP itself. This means that it couldn't find an authenticate header in the target response. This might mean that your proxy settings are misconfigured, if you have anything like that.
I don't know if this is what you are looking for but I came across the same problem trying to authenticate to a webpage and had to solve it with WWW::Mechanize. I had to go to the first page and login then request the page I wanted.
use WWW::Mechanize;
my $loginPage = "http://my.url.com/login.htm"; # Authentication page
my $mech = WWW::Mechanize->new(); # Create new brower object
$mech->get($loginPage); # Go to login page
$mech->form_name('LogonForm'); # Search form named LogonForm
$mech->field("username", myuser); # Fill out username field
$mech->field("password", mypass); # Fill out password field
$mech->click("loginloginbutton"); # submit form
$mech->get("http://my.url.com/somepath/"); # Get webpage
# Some more code here with $mech->content()

Unable to login into a site using www:Mechanize

I am using WWW:Mechanize to try to login to a site.
Code
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$mech->get("https://www.amazon.com/gp/css/homepage.html/");
$mech->submit_form(
form_name => 'yaSignIn',
fields => {
email => 'email',
qpassword=> 'pass'
}
);
print $mech->content();
However it is not being logged into the site. What am i doing wrong. The website redirects and says please enable cookies to continue. How do i do that .
Try putting this block before your get.
$mech->cookie_jar(
HTTP::Cookies->new(
file => "cookies.txt",
autosave => 1,
ignore_discard => 1,
)
);
SuperEdit2: I just tried this myself and it seemed to work. Give it a try.(changed the form number to 3 and added an agent alias)
use strict;
use warnings;
use WWW::Mechanize;
# Create a new instance of Mechanize
my $bot = WWW::Mechanize->new();
$bot->agent_alias( 'Linux Mozilla' );
# Create a cookie jar for the login credentials
$bot->cookie_jar(
HTTP::Cookies->new(
file => "cookies.txt",
autosave => 1,
ignore_discard => 1,
)
);
# Connect to the login page
my $response = $bot->get( 'https://www.amazon.com/gp/css/homepage.html/' );
# Get the login form. You might need to change the number.
$bot->form_number(3);
# Enter the login credentials.
$bot->field( email => 'email' );
$bot->field( password => 'pass' );
$response = $bot->click();
print $response->decoded_content;

How to ignore 'Certificate Verify Failed' error in perl?

I want to access a website where the certificate cannot be verified. I'm using WWW::Mechanize get request. So how would go about ignoring this and continues to connect to the website?
use IO::Socket::SSL qw();
use WWW::Mechanize qw();
my $mech = WWW::Mechanize->new(ssl_opts => {
SSL_verify_mode => IO::Socket::SSL::SSL_VERIFY_NONE,
verify_hostname => 0, # this key is likely going to be removed in future LWP >6.04
});
With IO::Socket::SSL earlier than 1.79, see PERL_LWP_SSL_VERIFY_HOSTNAME.
my $mech = WWW::Mechanize->new( 'ssl_opts' => { 'verify_hostname' => 0 } );