Following links using WWW::Mechanize - perl

I am trying to access an internal webpage to start and stop application using WWW::Mechanize. So far I am able to log in to the application successfully. My next action item is to identify a particular service from list of services and stop them.
The problem I am facing is I am unable to follow the link on the webpage. After looking at the HTML and link object, it is evident that there isn't a URL but an on click event.
Here is snippet of HTML
<ul>
<li>
servicename
</li>
</ul>
The link object dump is
$VAR1 = \bless( [
'#',
'servicename',
'j_id_id1:j_id_id9:2:j_id_id10',
'a',
bless( do{\(my $o = 'http://blah.services.jsf')}, 'URI::http' ),
{
'href' => '#',
'style' => 'color:#3BB9FF;',
'name' => 'j_id_id1:j_id_id9:2:j_id_id10',
'onclick' => 'A4J.AJAX.Submit(\'j_id_id1\',event,{\'similarityGroupingId\':\'j_id_id1:j_id_id9:2:j_id_id10\',\'parameters\':{\'j_id_id1:j_id_id9:2:j_id_id10\':\'j_id_id1:j_id_id9:2:j_id_id10\',\'ajaxSingle\':\'j_id_id1:j_id_id9:2:j_id_id10\'} ,\'containerId\':\'j_id_id0\',\'actionUrl\':\'/pages/services.jsf;jsessionid=NghBSoEJZKXbWcK0uVzcHvyebl8G_zSpf_Zu4uqrLI7xosHAnheK!1108773228\'} );return false;',
'id' => 'j_id_id1:j_id_id9:2:j_id_id10'
}
], 'WWW::Mechanize::Link' );
Here is my code so far:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use WWW::Mechanize;
my $username = 'myuser';
my $password = 'mypass';
my $url = 'myinternalurl';
my $mech = WWW::Mechanize->new();
$mech->credentials($username,$password);
$mech->get($url);
my $link = $mech->find_link( text => 'servicename' );
#print Dumper \$link;
#$mech->follow_link( url => $link->url_abs() );
$mech->get($link->url_abs());
print $mech->text();
If I use follow_link, I get Link not found at log_in.pl line 16.. If I use get then I get back the same page. The problem is all these services appear to be hyperlinks but have the same url as my main url.
Here is a pic of the webpage:
When I manually click a service the Operations and Properties section change which allows the user to view Operation and Properties of the service they just clicked. Every service has different set of Operations and Properties.
How should I go about do this using perl? Is WWW::Mechanize the wrong tool for this one? Can anyone please suggest a solution or an alternate perl module that could help. Installing any CPAN module is not an issue. Working with latest version of perl is not an issue either. I have just started automating with perl and currently unaware of all the modules that could get the job done.
Looking forward to your guidance and help.
Note: If you feel there is any pertinent information, I may have missed, please leave a comment and I will update the question to add more details. I have modified proprietary information.

That button contains a Javascript onclick event, which will not work when using WWW::Mechanize.
Per the docs:
Please note that Mech does NOT support JavaScript, you need additional software for that. Please check "JavaScript" in WWW::Mechanize::FAQ for more.
One alternative that does support Javascript in a forms is WWW::Mechanize::Firefox.

Related

Logging in using WWW::Mechanize

I'm looking at logging in to https://imputationserver.sph.umich.edu/index.html#!pages/login
with the following:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie ':all';
use WWW::Mechanize;
use DDP;
my $mech = WWW::Mechanize->new();
$mech->get( 'https://imputationserver.sph.umich.edu/index.html#!pages/login' );
my $username = '';
my $password = '';
#$mech->set_visible( $username, $password );
#$mech -> field('Username:', $username);
#$mech -> field('Password:', $password);
my %data;
#{ $data{links} } = $mech -> find_all_links();
#{ $data{inputs} } = $mech -> find_all_inputs();
#{ $data{submits} } = $mech ->find_all_submits();
#{ $data{forms} } = $mech -> forms();
p %data;
#$mech->set_fields('Username' => $username, 'Password' => $password);
but there doesn't appear to be any useful information, which is shown by printing:
{
forms [],
inputs [],
links [
[0] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "favicon.ico",
[1] undef,
[2] undef,
[3] "link",
[4] URI::https,
[5] {
href "favicon.ico",
rel "icon"
}
]
},
[1] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "assets/css/loader.css",
[1] undef,
[2] undef,
[3] "link",
[4] var{links}[0][4],
[5] {
href "assets/css/loader.css",
rel "stylesheet"
}
]
}
],
submits []
}
I looked on Firefox's Tools -> page info, but got nothing valuable there, I don't see where the username and password are coming from on this page.
I've tried
$mech -> submit_form(
form_number => 0,
fields => { username => $username, password => $password },
);
but then I get No form defined
In terms of links, inputs, fields, I don't see any, and I don't know how to move on.
I don't see anything on https://metacpan.org/pod/WWW::Mechanize::Examples that helps me out in this situation.
How can I log in to this page using Perl's WWW::Mechanize?
As Dave says, many modern websites are going to be handling login via a Javascript-driven (private) API. You'll need to open the Network tab in your browser, do the login manually as you normally would, and watch the sequence of GETs, PUTs, POSTs, etc. that happen to see what interaction is needed to complete a login, and then execute that sequence yourself with Mech or LWP.
It's possible that the Javascript on the page is going to create JSON or even JWTs to do the interactions; you'll have to duplicate that in your code for it to work.
In particular, check the headers for cookies, and authentication and CSRF tokens being set; you'll need to capture those and re-send them with requests (POST requests will need the CSRF tokens). This may entail doing more interactions with the site to capture the sequence of operations and duplicate them. HTTP::Cookies should handle the cookies for you automatically, but more sophisticated header usage will require you to use HTTP::Headers to extract the data and possibly resubmit it that way.
At heart, the processes are all pretty simple; it's just a matter of accurately replicating them so that you can automate them.
You should check as to whether the site already has a programmer's API, and use that if so; such an API will almost always provide you simpler, direct interfaces to site functions and easier-to-use returned data formats. If the site is highly dynamic, like a heavy React site, it's possible that other pages in the site are going to load a skeletal HTML page and then use Javascript to fill it out as well; as the page evolves, your code will have to as well. If you're using a defined programmer's API, you will probably be able to depend on the interactions and returned data remaining the same as long as the API version doesn't change.
A final note: you should verify that you're not violating your user agreement by using automation. Some sites explicitly bar using automated methods of logging in.
The interesting part of the source from that page is this:
<body class="bg-light">
<div id="main">
<div class="spinner">
<div class="bounce1"></div>
<div class="bounce2"></div>
<div class="bounce3"></div>
</div>
</div>
<script src="./dist/bundles/cloudgene/index.js"></script>
</body>
So, there's no login form in the HTML that makes up that page. Which explains why WWW::Mechanize can't see anything - there's nothing there to see.
It seems that that the page is all built by that Javascript file - index.js.
Now, you could spend hours reading that JS and working exactly how the page works. But that'll be hard work and there's an easier way.
No matter how the client (the browser or your code) works, the actual login must be handled by an HTTP request and response. The client sends a request, the server responds and the client acts on that response. You just need to work out what the request and response look like and then reproduce that in your code.
And you can examine the HTTP requests and response using tools that are almost certainly built into your browser (in Chrome, it's dot menu -> more tools -> developer tools). That will allow you to see exactly what the HTTP request looks like.
Having done that, you "just" need to craft a similar response using your Perl code. You'll probably find that's easier using LWP::UserAgent and its associated modules rather than WWW::Mechanize.
WWW::Mechanize is a web client with some HTML parsing capabilities. But as Dave Cross pointed out, the form you want is not in the HTML document you requested. It's generated by some JavaScript code. To do what the browser does would require a JavaScript engine, which WWW::Mechanize doesn't have.
The simplest way to achieve that is to remote-control a web browser (e.g. using Selenium::Chrome).
The other is to manually craft the login request without getting and filling the form.
Looking at your code, I see the following URL:
https://imputationserver.sph.umich.edu/index.html#!pages/login
It is this part in particular that drew my attention: #!pages/login
This likely means that the login form is not present on the page when it is loaded, and is instead added to the page with JavaScript after page load.
Your script doesn't know this, however, and looks for the login form and its elements right away after page load.
The easiest way to solve this issue is to place a hard-coded timeout of, let's say, 5 seconds between page load and trying to log in.
The more "correct" way of handling this is to wait for the login form to appear by checking for its controls, and then proceed with the login process.

perl WWW::Mechanize can't seem to find the right form or assign fields or click submit

So I'm trying to create a perl script that logs in to SAP BusinessObjects Central Management Console (CMC) page but it doesn't even look like it's finding the right form or finding the right field or even clicking Submit.
Here's my code:
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get("http://myserver:8080/BOE/CMC");
$mech->form_name("_id2");
$mech->field("_id2:logon:CMS", "MYSERVER:6400");
$mech->field("_id2:logon:SAP_SYSTEM", "");
$mech->field("_id2:logon:SAP_CLIENT", "");
$mech->field("_id2:logon:USERNAME", "MYUSER");
$mech->field("_id2:logon:PASSWORD", "MYPWD");
$mech->field("_id2:logon:AUTH_TYPE", "secEnterprise");
$mech->click;
print $mech->content();
When I run it, I don't get any errors but the output I get is the login page again. Even more puzzling, it doesn't seem to be accepting the field values I send it (the output would display default values instead of the values I assign it). Putting in a wrong user or password doesn't change anything - no error but I just get the login page back with default values
I think the script itself is fine since I changed the necessary fields and I was able to log in to our Nagios page (the output page definitely shows Nagios details). I think the CMC page is not so simple, but I need help in figuring out how to make it work.
What I've tried:
1
use Data::Dumper;
print $mech->forms;
print Dumper($mech->forms());
What that gave me is:
Current form is: WWW::Mechanize=HASH(0x243d828)
Part of the Dumper output is:
'attr' => {
'target' => 'servletBridgeIframe',
'style' => 'display:none;',
'method' => 'post'
},
'inputs' => []
I'm showing just that part of the Dumper output because it seems that's the relevant part. When I tried the same thing with our Nagios page, the 'attr' section had a 'name' field which the above doesn't. The Nagios page also had entries for 'inputs' such as 'useralias' and 'password' but the above doesn't have any entries.
2
$mech->form_number(1);
Since I wasn't sure I was referencing the form correctly, I just had it try using the first form it finds (the page only has one form anyway). My result was the same - no error and the output is the login page with default values.
3
I messed around with escaping (with '\') the underscore (_) and colon (:) in the field names.
I've searched and didn't find anything that said I had to escape any characters but it was worth a shot. All I know is, the Nagios page field names only contained letters and it worked.
I got field names from Chrome's developer tool. For example, the User Name form field showed:
<input type="text" id="_id2:logon:USERNAME" name="_id2:logon:USERNAME" value="Administrator">
I don't know if Mechanize has a problem with names starting with underscore or names containing colons.
4
$mech->click("_id2:logon:logonButton");
Since I wasn't sure the "Log On" button was being clicked I tried to specify it but it gave me an error:
No clickable input with name _id2:logon:logonButton at /usr/share/perl5/WWW/Mechanize.pm line 1676
That's probably because there is no name defined on the button (I used the id instead) but I thought it was worth a shot. Here's the code of the button:
<input type="submit" id="_id2:logon:logonButton" value="Log On" class="logonButtonNoHover logon_button_no_hover" onmouseover="this.className = 'logonButtonHover logon_button_hover';" onmouseout="this.className = 'logonButtonNoHover logon_button_no_hover';">
There's only one button on the form anyway so I shouldn't have needed to specify it (I didn't need to for the Nagios page)
5
The interactive shell of Mechanize
Here's the output when I tried to retrieve all forms on the page:
$ perl -MWWW::Mechanize::Shell -eshell
(no url)>get http://myserver:8080/BOE/CMC
Retrieving http://myserver:8080/BOE/CMC(200)
http://myserver:8080/BOE/CMC>forms
Form [1]
POST http://myserver:8080/BOE/CMC/1412201223/admin/logon.faces
Help!
I don't really know perl so I don't know how to troubleshoot this further - especially since I'm not seeing errors. If someone can direct me to other things to try, it would be helpful.
In this age of DOM and Javascript, there's lots of things that can go wrong with Web automation. From your results, it looks like maybe the form is built in browser space, which can be really hard to deal with programmatically.
The way to be sure is to dump the original response and look at the form code it contains.
If that turns out to be your problem, your simplest recourse is something like Mozilla::Mechanize.
When dealing with forms, it can sometimes be easier to replicate the request the form generates than to try to work with the form through Mechanize.
Try using your browser's developer tools to monitor what happens when you log into the site manually (in Firefox or Chrome it'll be under the Network tab), and then generate the same request with Mechanize.
For example, the resulting code MIGHT look something like:
my $post_data => {
'_id2:logon:CMS' => "MYSERVER:6400",
'_id2:logon:SAP_SYSTEM' => "",
'_id2:logon:SAP_CLIENT' => "",
'_id2:logon:USERNAME' => "MYUSER",
'_id2:logon:PASSWORD' => "MYPWD",
'_id2:logon:AUTH_TYPE' => "secEnterprise",
};
$mech->post($url, $post_data);
unless ($mech->success()){
warn "Failed to post to $url: " . $mech->response()->status_line() . "\n";
}
print $mech->content();
Where %post_data should match exactly the data that's passed in the manual post to the site and not just what's in the HTML--the keys or data could be transformed by javascript before the actual post is made.
I had someone more knowledgeable than me give me help. The main hurdle was how the page was constructed in frames and how it operated. Here are the details:
The URL of the frame that contained the login page is "http://myserver:8080/BOE/CMC/0000000000/myuser/logon.faces". The main frame of the page had a form in it, but it wasn't the logon form, which explains why the form from my original code didn't have the logon fields I was expecting.
The other "gotcha" that I ran into was that after a successful logon, the site redirects you to a different URL: "http://myserver:8080/BOE/CMC/0000000000/myuser/App/home.faces?service=%2Fmyuser%2FApp%2F". So to check a successful login, I had to get this URL and check for whatever text I decided to look for.
I also had to refer to the logon form by id and not by name (since the form did not have a name).
Here's the working code:
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get("http://myserver:8080/BOE/CMC/0000000000/myuser/logon.faces");
$mech->form_id("_id2");
$mech->field("_id2:logon:CMS", "MYSERVER:6400");
$mech->field("_id2:logon:SAP_SYSTEM", "");
$mech->field("_id2:logon:SAP_CLIENT", "");
$mech->field("_id2:logon:USERNAME", "MyUser");
$mech->field("_id2:logon:PASSWORD", "MyPwd");
$mech->field("_id2:logon:AUTH_TYPE", "secEnterprise");
$mech->click;
$mech->get("http://myserver:8080/BOE/CMC/0000000000/myuser/App/home.faces?service=%2Fmyuser%2FApp%2FappService.jsp&appKind=CMC");
$output_page = $mech->content();
if (index($output_page, "Welcome:") != -1)
{
print "\n\n+++++ Successful login! ++++++\n\n";
}
else
{
print "\n\n----- Login failed!-----\n\n";
}
For validating that I had successfully logged in, I kept it very simple and just searched for the "Welcome:" text (as in "Welcome: MyUser").

Filling out a basic form perl WWW::Mechanize

Im trying to fill out the 'form' on the twitter search home page (link in the code defined as $url) its very basic a box for what you want to search and a search button. But its giving me a lot of difficulty, I cant seem to get it to work.
Here is the portion of my script where i'm filling out the 'form'
my $mech = WWW::Mechanize->new();
my $url = "https://twitter.com/search-home";
blah. blah. blah.
$mech->get($url);
$mech->submit_form(
form_number=> 1,
fields => {
query => $tweetsearch,
button => "Search",
}
);
print $mech->uri();
When it prints it, it prints out $url meaning it didnt do anything where it should print https://twitter.com/searchsrc=typd&q=from%3Anikestore%20%22jordan%22%20%22concord%22%20%22now%20available%22%20since%3A2014-5-2
Any help?
You're using wrong form:
$mech->submit_form(
form_id => "search-home-form",
fields => {
q => "#YourTimeHasCome #MVP #NBA2K15",
},
);
Like many modern websites, Twitter tends to make heavy use of JavaScript.
From the WWW::Mechanize documentation:
Please note that Mech does NOT support JavaScript, you need additional software for that. Please check "JavaScript" in WWW::Mechanize::FAQ for more.
The FAQ entry mentions these other modules which may have better JS support:
Gtk2::WebKit::Mechanize
WWW::Mechanize::Firefox
WWW::Scripter
WWW::Selenium
Many websites also offer computer-friendly APIs. Here is Twitter's API documentation.

Avoid Form from being submitted twice by user with Perl (CGI)

I’m writing a small Perl page that receives a POST method submit. I want to be able to prevent from a single person/computer to submit the form multiple times (to avoid flooding with repetitive submits). But I can’t find any examples or explanations on how to do this in Perl CGI. Could you advise or direct me to some examples?
I understand I can use some data from the HTTP header (token?) and/or plant a cookie after the first submit, but I’m not sure how.
Any help will be appreciated.
Best Regards,
-Arseny
The most simple way of avoiding users clicking the button several times would be to add some Javascript to your page. That would ofc not work for scripts or for i.e. pressing F5.
<input type="submit" name="go" id="go" value="go" onclick="this.disabled='disabled'"/>
You could also write a log file/database on the server that holds the IP address of the user and the timestamp, and check whether he as already submitted. Doing that in addition to setting and checking a cookie is probably the way to go.
For cookies, see cookies in the CGI doc. Simple example:
use strict; use warnings;
use CGI;
my $q = new CGI;
my $submitted = 0;
if ($q->cookie('submitted ')) {
$submitted = 1;
}
# Here you could place the file/db check to also set $voted
if ($submitted) {
print $q->header('text/plain');
print "You have already submitted!";
} else {
# Do stuff with the form, like $q->param('foo')...
# Once you're done, place the cookie
print $q->header(
-type => 'text/plain',
-cookie => $q->cookie(
-name => 'submitted',
-value => 1,
-expires => '+1y',
));
}

How do I use Perl's LWP to log in to a web application?

I would like to write a script to login to a web application and then move to other parts
of the application:
use HTTP::Request::Common qw(POST);
use LWP::UserAgent;
use Data::Dumper;
$ua = LWP::UserAgent->new(keep_alive=>1);
my $req = POST "http://example.com:5002/index.php",
[ user_name => 'username',
user_password => "password",
module => 'Users',
action => 'Authenticate',
return_module => 'Users',
return_action => 'Login',
];
my $res = $ua->request($req);
print Dumper(\$res);
if ( $res->is_success ) {
print $res->as_string;
}
When I try this code I am not able to login to the application. The HTTP status code returned is 302 that is found, but with no data.
If I post username/password with all required things then it should return the home page of the application and keep the connection live to move other parts of the application.
You may be able to use WWW::Mechanize for this purpose:
Mech supports performing a sequence of page fetches including following links and submitting forms. Each fetched page is parsed and its links and forms are extracted. A link or a form can be selected, form fields can be filled and the next page can be fetched. Mech also stores a history of the URLs you've visited, which can be queried and revisited.
I'm guessing that LWP isn't following the redirect:
push #{ $ua->requests_redirectable }, 'POST';
Any reason why you're not using WWW::Mechanize?
I've used LWP to log in to plenty of web sites and do stuff with the content, so there should be no problem doing what you want. Your code looks good so far but two things I'd suggest:
As mentioned, you may need to make the requests redirectable
You may also need to enable cookies:
$ua->cookie_jar( {} );
Hope this helps