How do I use Perl's LWP to log in to a web application? - perl

I would like to write a script to login to a web application and then move to other parts
of the application:
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use Data::Dumper;
$ua = LWP::UserAgent->new(keep_alive=>1);
my $req = POST "http://example.com:5002/index.php",
[ user_name => 'username',
user_password => "password",
module => 'Users',
action => 'Authenticate',
return_module => 'Users',
return_action => 'Login',
];
my $res = $ua->request($req);
print Dumper(\$res);
if ( $res->is_success ) {
print $res->as_string;
}
When I try this code I am not able to login to the application. The HTTP status code returned is 302 that is found, but with no data.
If I post username/password with all required things then it should return the home page of the application and keep the connection live to move other parts of the application.

You may be able to use WWW::Mechanize for this purpose:
Mech supports performing a sequence of page fetches including following links and submitting forms. Each fetched page is parsed and its links and forms are extracted. A link or a form can be selected, form fields can be filled and the next page can be fetched. Mech also stores a history of the URLs you've visited, which can be queried and revisited.

I'm guessing that LWP isn't following the redirect:
push #{ $ua->requests_redirectable }, 'POST';
Any reason why you're not using WWW::Mechanize?

I've used LWP to log in to plenty of web sites and do stuff with the content, so there should be no problem doing what you want. Your code looks good so far but two things I'd suggest:
As mentioned, you may need to make the requests redirectable
You may also need to enable cookies:
$ua->cookie_jar( {} );
Hope this helps

Related

perl jira rest api port number

my $jira = JIRA::REST->new({
url => 'https://something.com:8443',
username => 'username',
password => 'password',
session => 1,
});
The above code doesn't work and fails with below error probably due to port number at the end
Can't connect to something.com:8443 (Bad file descriptor)
is there a way/variable to mention the port number?
You need to add the setting ssl_verify_none => 1 into the hash used in your constructor.
The underlying LWP code allows you to not verify the certs of systems you connect to (which is not recommended for production), or it also allows you to specify a Certificate Authority (CA) cert that can be used to verify certs of systems you connect to. It looks like JIRA::REST has only supported the first option.
You might be better off just using the underlying LWP code, like this:
use LWP::UserAgent;
my $ua = LWP::UserAgent->new(
ssl_opts => { verify_hostname => 1, SSL_ca_file => 'myCA.cer' },
protocols_allowed => ['https'],
);
my $req = HTTP::Request->new(
GET => 'https://something.com:8443/rest/api/latest/issue/ABC-123',
);
$req->authorization_basic('username','password');
my $res = $ua->request($req);
It looks like JIRA::REST is just providing the raw JSON response to you anyway, so it's not really saving you all that much processing.
The main advantage of REST::Client is that it saves some stuff for you to add to each request implicitly. There's nothing particularly REST-y or helpful beyond sending a request and giving its response back to you.
The JIRA::Client has a few advantages, though, since it knows how to get a session cookie and properly attach files, and, more importantly, deal with paginated results. But often, when I'm doing things with Jira, I want more power.
Back in LWP's heyday, it was very frustrating to track transactions: a request-response pair. You could, but you had to manage it yourself. And, there weren't hooks in the process, so you had to create everything every time.
Then, LWP tried to work around some SSL issues (verifying host names, etc) and also split out LWP::Protocol::https. That's not a big deal when you understand that, but even though I do, it's something I have to remember every time I want to use LWP for something. There are reasons for everything that happened, but that doesn't make it any less annoying. That leads to the sort of work jimtut showed in his answer. Every time. But, it's a small speed bump on the way to insecurity.
I like Mojolicious much more because it represents complete transactions but also has hooks (well, events) that allow you to fiddle with things automatically while the process is chugging along.
Here's an example from Mojo Web Clients that shows me creating a user-agent for each service and setting some stuff for each transaction. I can adjust the request any way that I please before it does its work (and this is mostly what that REST::Client and JIRA::Client are doing for you):
my $travis_ua = Mojo::UserAgent->new();
$travis_ua->on( start => sub {
my( $ua, $tx ) = #_;
$tx->req->headers->authorization(
"token $ENV{TRAVIS_API_KEY}" );
$tx->req->headers->accept(
'application/vnd.travis-ci.2.1+json' );
} );
my $appveyor_ua = Mojo::UserAgent->new();
$appveyor_ua->on( start => sub {
my( $ua, $tx ) = #_;
$tx->req->headers->authorization(
"Bearer $ENV{APPVEYOR_API_KEY}" );
} );
In the Basic auth case, it's just a different value in that header:
use Mojo::Util qw(b64_encode);
my $jira_ua = Mojo::UserAgent->new();
$jira_ua->on( start => sub {
my( $ua, $tx ) = #_;
$tx->req->headers->authorization(
'Basic ' . b64_encode( join ':', $username, $password ) );
} );
Now, when I use those user-agents, the auth stuff is automatically added:
my $tx = $travis_ua->get( $url );
And, that $tx gives me access to the request and the response, so I don't need REST::Client to handle that for me either.
Since Mojolicious is handling all of this in one convenient package, I don't have to wrangle different objects. As such, there's not much left over that REST::Client can do for me.

Perl modules for web automation

My situation: Want to create a perl script to automate web login page. I need also to be able to pass in (POST) components such as HTTP Headers, i.e X-forwarded-for, username, password, csrf tokens and the likes. I know of Mechanize, but which other modules can I use to do the mentioned? Can LWP do it?
Library recommendations are off-topic, so I'll focus the question about LWP.
Yes, LWP::UserAgent (and its subclass WWW::Mechanize) can be used to send arbitrary headers.
$ua->request takes a request object with custom headers. You can create this object and perform the request as follows:
use HTTP::Request::Common qw( GET );
my $request = GET($url,
HeaderName => 'HeaderValue',
HeaderName => 'HeaderValue',
);
my $response = $ua->request($request);
LWP:UserAgent provides a shorthand for this.
my $response = $ua->get($url,
HeaderName => 'HeaderValue',
HeaderName => 'HeaderValue',
);

Logging in using WWW::Mechanize

I'm looking at logging in to https://imputationserver.sph.umich.edu/index.html#!pages/login
with the following:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie ':all';
use WWW::Mechanize;
use DDP;
my $mech = WWW::Mechanize->new();
$mech->get( 'https://imputationserver.sph.umich.edu/index.html#!pages/login' );
my $username = '';
my $password = '';
#$mech->set_visible( $username, $password );
#$mech -> field('Username:', $username);
#$mech -> field('Password:', $password);
my %data;
#{ $data{links} } = $mech -> find_all_links();
#{ $data{inputs} } = $mech -> find_all_inputs();
#{ $data{submits} } = $mech ->find_all_submits();
#{ $data{forms} } = $mech -> forms();
p %data;
#$mech->set_fields('Username' => $username, 'Password' => $password);
but there doesn't appear to be any useful information, which is shown by printing:
{
forms [],
inputs [],
links [
[0] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "favicon.ico",
[1] undef,
[2] undef,
[3] "link",
[4] URI::https,
[5] {
href "favicon.ico",
rel "icon"
}
]
},
[1] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "assets/css/loader.css",
[1] undef,
[2] undef,
[3] "link",
[4] var{links}[0][4],
[5] {
href "assets/css/loader.css",
rel "stylesheet"
}
]
}
],
submits []
}
I looked on Firefox's Tools -> page info, but got nothing valuable there, I don't see where the username and password are coming from on this page.
I've tried
$mech -> submit_form(
form_number => 0,
fields => { username => $username, password => $password },
);
but then I get No form defined
In terms of links, inputs, fields, I don't see any, and I don't know how to move on.
I don't see anything on https://metacpan.org/pod/WWW::Mechanize::Examples that helps me out in this situation.
How can I log in to this page using Perl's WWW::Mechanize?
As Dave says, many modern websites are going to be handling login via a Javascript-driven (private) API. You'll need to open the Network tab in your browser, do the login manually as you normally would, and watch the sequence of GETs, PUTs, POSTs, etc. that happen to see what interaction is needed to complete a login, and then execute that sequence yourself with Mech or LWP.
It's possible that the Javascript on the page is going to create JSON or even JWTs to do the interactions; you'll have to duplicate that in your code for it to work.
In particular, check the headers for cookies, and authentication and CSRF tokens being set; you'll need to capture those and re-send them with requests (POST requests will need the CSRF tokens). This may entail doing more interactions with the site to capture the sequence of operations and duplicate them. HTTP::Cookies should handle the cookies for you automatically, but more sophisticated header usage will require you to use HTTP::Headers to extract the data and possibly resubmit it that way.
At heart, the processes are all pretty simple; it's just a matter of accurately replicating them so that you can automate them.
You should check as to whether the site already has a programmer's API, and use that if so; such an API will almost always provide you simpler, direct interfaces to site functions and easier-to-use returned data formats. If the site is highly dynamic, like a heavy React site, it's possible that other pages in the site are going to load a skeletal HTML page and then use Javascript to fill it out as well; as the page evolves, your code will have to as well. If you're using a defined programmer's API, you will probably be able to depend on the interactions and returned data remaining the same as long as the API version doesn't change.
A final note: you should verify that you're not violating your user agreement by using automation. Some sites explicitly bar using automated methods of logging in.
The interesting part of the source from that page is this:
<body class="bg-light">
<div id="main">
<div class="spinner">
<div class="bounce1"></div>
<div class="bounce2"></div>
<div class="bounce3"></div>
</div>
</div>
<script src="./dist/bundles/cloudgene/index.js"></script>
</body>
So, there's no login form in the HTML that makes up that page. Which explains why WWW::Mechanize can't see anything - there's nothing there to see.
It seems that that the page is all built by that Javascript file - index.js.
Now, you could spend hours reading that JS and working exactly how the page works. But that'll be hard work and there's an easier way.
No matter how the client (the browser or your code) works, the actual login must be handled by an HTTP request and response. The client sends a request, the server responds and the client acts on that response. You just need to work out what the request and response look like and then reproduce that in your code.
And you can examine the HTTP requests and response using tools that are almost certainly built into your browser (in Chrome, it's dot menu -> more tools -> developer tools). That will allow you to see exactly what the HTTP request looks like.
Having done that, you "just" need to craft a similar response using your Perl code. You'll probably find that's easier using LWP::UserAgent and its associated modules rather than WWW::Mechanize.
WWW::Mechanize is a web client with some HTML parsing capabilities. But as Dave Cross pointed out, the form you want is not in the HTML document you requested. It's generated by some JavaScript code. To do what the browser does would require a JavaScript engine, which WWW::Mechanize doesn't have.
The simplest way to achieve that is to remote-control a web browser (e.g. using Selenium::Chrome).
The other is to manually craft the login request without getting and filling the form.
Looking at your code, I see the following URL:
https://imputationserver.sph.umich.edu/index.html#!pages/login
It is this part in particular that drew my attention: #!pages/login
This likely means that the login form is not present on the page when it is loaded, and is instead added to the page with JavaScript after page load.
Your script doesn't know this, however, and looks for the login form and its elements right away after page load.
The easiest way to solve this issue is to place a hard-coded timeout of, let's say, 5 seconds between page load and trying to log in.
The more "correct" way of handling this is to wait for the login form to appear by checking for its controls, and then proceed with the login process.

Unable to obtain cookie-based access to Iframe within a HTML page

I am trying to automate sms sending through Way2sms in Perl LWP. I am able to login successfully. Then I click the Send SMS link in the browser to go the sms-sending page. From there, based upon the page url and the url of the iframe within which the sms fields are located, I try to construct the absolute URL of the page to which the form should be posted, and post it with the correct parameters (you can see it all in the image). However, the sms isn't being sent. Can anybody tell me what am I doing wrong here? (There is a similar module in CPAN which accomplishes this through Mechanize, I am trying a different approach)
use LWP::UserAgent;
use HTTP::Cookies;
my $cookie_jar = HTTP::Cookies->new(
file => "cookies.txt",
autosave => 1,
);
my $ua = LWP::UserAgent->new(
agent =>
'Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20100101 Firefox/14.0.1',
cookie_jar => $cookie_jar,
);
my $response = $ua->post(
'http://site2.way2sms.com/contentt/bar/Login1.action',
{
username => $mob,
password => $pass,
}
);
if ( $response->is_redirect ) {
$response = $ua->get( $response->header('Location') );
print 5 if $response->decoded_content =~ /Kaustav Mukherjee/i; #prints it, showing that the login is successful
}
my $smsresp = $ua->post("http://site5.way2sms.com/jsp/quicksms.action",[MobNo=>$mob,textArea=>'Hello World']);
if ( $smsresp->is_redirect ) {
$smsresp = $ua->post($smsresp->header('Location'),[MobNo=>$mob,textArea=>'Hello World']);
}
Here is an indirect answer that may help you out:
If you have to use this website, try using something like CasperJS. There may be some JavaScript magic that needs to happen.
Maybe you could use something like Google Voice or some other smarter service to automate send text messages. Looks like someone on CPAN already has a sweet Google::Voice module for this.

How to read a web page content which may itself be redirected to another url?

I'm using this code to read the web page content:
my $ua = new LWP::UserAgent;
my $response= $ua->post($url);
if ($response->is_success){
my $content = $response->content;
...
But if $url is pointing to moved page then $response->is_success is returning false. Now how do I get the content of redirected page easily?
You need to chase the redirect itself.
if ($response->is_redirect()) {
$url = $response->header('Location');
# goto try_again
}
You may want to put this in a while loop and use "next" instead of "goto". You may also want to log it, limit the number of redirections you are willing to chase, etc.
[update]
OK I just noticed there is an easier way to do this. From the man page of LWP::UserAgent:
$ua->requests_redirectable
$ua->requests_redirectable( \#requests )
This reads or sets the object's list of request names that
"$ua->redirect_ok(...)" will allow redirection for. By default,
this is "['GET', 'HEAD']", as per RFC 2616. To change to include
'POST', consider:
push #{ $ua->requests_redirectable }, 'POST';
So yeah, maybe just do that. :-)