Given a query string and the empty form html it came from generate filled in html - perl

I have an html with empty forms and I have the query string that was generated when filling in those forms.
How can I merge them together into a filled in html page?
Hopefully you can give me a perl based solution.
Edit: I have a web scraper based on WWW::Mechanize with perl. I am saving the html content to generate a hmtl slideshow of the session, however I can't save the html with the filled-in values.
I have looked at mechanize source and the it is creating a HTML::Form object to handle the forms. I have looked at HTML::Form and I don't see how I can turn the object back to html, there is just a dump method.
There is a section in HTML::Form code that lets me generate the POST or GET request and I thought that maybe that was a good starting point to generate filled-in hmtl by merging the request with the original html.
if ($method eq "GET") {
require HTTP::Request;
$uri = URI->new($uri, "http");
$uri->query_form(#form);
return HTTP::Request->new(GET => $uri);
}
elsif ($method eq "POST") {
require HTTP::Request::Common;
return HTTP::Request::Common::POST($uri, \#form,
Content_Type => $enctype);
}
So I can use that code snipet in my mechanize program after I am done filling forms to get the final POST or GET request but that is as far as I go :(

I am assuming you want to show the form with its values.
All you have to do is set the value of a form element to be the value from the query string.
if you give more info, we can tell you more specific answers...

Related

Logging in using WWW::Mechanize

I'm looking at logging in to https://imputationserver.sph.umich.edu/index.html#!pages/login
with the following:
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use feature 'say';
use autodie ':all';
use WWW::Mechanize;
use DDP;
my $mech = WWW::Mechanize->new();
$mech->get( 'https://imputationserver.sph.umich.edu/index.html#!pages/login' );
my $username = '';
my $password = '';
#$mech->set_visible( $username, $password );
#$mech -> field('Username:', $username);
#$mech -> field('Password:', $password);
my %data;
#{ $data{links} } = $mech -> find_all_links();
#{ $data{inputs} } = $mech -> find_all_inputs();
#{ $data{submits} } = $mech ->find_all_submits();
#{ $data{forms} } = $mech -> forms();
p %data;
#$mech->set_fields('Username' => $username, 'Password' => $password);
but there doesn't appear to be any useful information, which is shown by printing:
{
forms [],
inputs [],
links [
[0] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "favicon.ico",
[1] undef,
[2] undef,
[3] "link",
[4] URI::https,
[5] {
href "favicon.ico",
rel "icon"
}
]
},
[1] WWW::Mechanize::Link {
public methods (9) : attrs, base, name, new, tag, text, URI, url, url_abs
private methods (0)
internals: [
[0] "assets/css/loader.css",
[1] undef,
[2] undef,
[3] "link",
[4] var{links}[0][4],
[5] {
href "assets/css/loader.css",
rel "stylesheet"
}
]
}
],
submits []
}
I looked on Firefox's Tools -> page info, but got nothing valuable there, I don't see where the username and password are coming from on this page.
I've tried
$mech -> submit_form(
form_number => 0,
fields => { username => $username, password => $password },
);
but then I get No form defined
In terms of links, inputs, fields, I don't see any, and I don't know how to move on.
I don't see anything on https://metacpan.org/pod/WWW::Mechanize::Examples that helps me out in this situation.
How can I log in to this page using Perl's WWW::Mechanize?
As Dave says, many modern websites are going to be handling login via a Javascript-driven (private) API. You'll need to open the Network tab in your browser, do the login manually as you normally would, and watch the sequence of GETs, PUTs, POSTs, etc. that happen to see what interaction is needed to complete a login, and then execute that sequence yourself with Mech or LWP.
It's possible that the Javascript on the page is going to create JSON or even JWTs to do the interactions; you'll have to duplicate that in your code for it to work.
In particular, check the headers for cookies, and authentication and CSRF tokens being set; you'll need to capture those and re-send them with requests (POST requests will need the CSRF tokens). This may entail doing more interactions with the site to capture the sequence of operations and duplicate them. HTTP::Cookies should handle the cookies for you automatically, but more sophisticated header usage will require you to use HTTP::Headers to extract the data and possibly resubmit it that way.
At heart, the processes are all pretty simple; it's just a matter of accurately replicating them so that you can automate them.
You should check as to whether the site already has a programmer's API, and use that if so; such an API will almost always provide you simpler, direct interfaces to site functions and easier-to-use returned data formats. If the site is highly dynamic, like a heavy React site, it's possible that other pages in the site are going to load a skeletal HTML page and then use Javascript to fill it out as well; as the page evolves, your code will have to as well. If you're using a defined programmer's API, you will probably be able to depend on the interactions and returned data remaining the same as long as the API version doesn't change.
A final note: you should verify that you're not violating your user agreement by using automation. Some sites explicitly bar using automated methods of logging in.
The interesting part of the source from that page is this:
<body class="bg-light">
<div id="main">
<div class="spinner">
<div class="bounce1"></div>
<div class="bounce2"></div>
<div class="bounce3"></div>
</div>
</div>
<script src="./dist/bundles/cloudgene/index.js"></script>
</body>
So, there's no login form in the HTML that makes up that page. Which explains why WWW::Mechanize can't see anything - there's nothing there to see.
It seems that that the page is all built by that Javascript file - index.js.
Now, you could spend hours reading that JS and working exactly how the page works. But that'll be hard work and there's an easier way.
No matter how the client (the browser or your code) works, the actual login must be handled by an HTTP request and response. The client sends a request, the server responds and the client acts on that response. You just need to work out what the request and response look like and then reproduce that in your code.
And you can examine the HTTP requests and response using tools that are almost certainly built into your browser (in Chrome, it's dot menu -> more tools -> developer tools). That will allow you to see exactly what the HTTP request looks like.
Having done that, you "just" need to craft a similar response using your Perl code. You'll probably find that's easier using LWP::UserAgent and its associated modules rather than WWW::Mechanize.
WWW::Mechanize is a web client with some HTML parsing capabilities. But as Dave Cross pointed out, the form you want is not in the HTML document you requested. It's generated by some JavaScript code. To do what the browser does would require a JavaScript engine, which WWW::Mechanize doesn't have.
The simplest way to achieve that is to remote-control a web browser (e.g. using Selenium::Chrome).
The other is to manually craft the login request without getting and filling the form.
Looking at your code, I see the following URL:
https://imputationserver.sph.umich.edu/index.html#!pages/login
It is this part in particular that drew my attention: #!pages/login
This likely means that the login form is not present on the page when it is loaded, and is instead added to the page with JavaScript after page load.
Your script doesn't know this, however, and looks for the login form and its elements right away after page load.
The easiest way to solve this issue is to place a hard-coded timeout of, let's say, 5 seconds between page load and trying to log in.
The more "correct" way of handling this is to wait for the login form to appear by checking for its controls, and then proceed with the login process.

How to receive json in Dancer?

I am very new to Perl framework Dancer. As of now I have a get http listener working. I have an Angular framework trying to post a json string to Dancer. How can I retreive the json and perhaps assign it to a scalar variable ($json).
get '/games' => sub {
header 'Access-Control-Allow-Origin' => '*';
&loadgames();
return $games;
};
post '/newgame' => sub {
header 'Access-Control-Allow-Origin' => '*';
#what should i put here to retrieve the json string
#I plan to pass the json string to a sub to convert to XML
};
I am not sure If I chose Dancer as backend framework that will get and post data.
Thanks for the help!
If your HTTP request has a JSON body (Content-type: application/json) rather than being an HTML form post, then you probably want something like this:
post '/url-path' => {
my $post = from_json( request->body );
# do something with the POSTed data structure
# which would typically be a hashref (or an arrayref)
# e.g.: schema->resultset('Widget')->create($post);
}
The from_json routine is one of the DSL Keywords provided by Dancer.
Dancer provides the params keyword for accessing route, body, and query parameters. You want a body parameter. Exactly which body parameter you want will depend on the name of the field you posted it to the route with (look at your form or your ajax request).
my $json_string = params('body')->{$field_name}
You can also use param, if you don't have any conflicting parameter names in the route or query parameters.
Once you have the json, remember it's just a string at the moment. You might want to read it into a perl data structure: Dancer provides from_json for this purpose.
As an aside: I notice in your get route, you call a function loadgames in void context, and then return a variable you haven't declared (or perhaps you have set it as a global - but do you need it to be a global?). I recommend beginning each perl file with use strict; to pick up issues like this. I suspect you probably just want to use the return value of loadgames as your return value.

perl WWW::Mechanize can't seem to find the right form or assign fields or click submit

So I'm trying to create a perl script that logs in to SAP BusinessObjects Central Management Console (CMC) page but it doesn't even look like it's finding the right form or finding the right field or even clicking Submit.
Here's my code:
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get("http://myserver:8080/BOE/CMC");
$mech->form_name("_id2");
$mech->field("_id2:logon:CMS", "MYSERVER:6400");
$mech->field("_id2:logon:SAP_SYSTEM", "");
$mech->field("_id2:logon:SAP_CLIENT", "");
$mech->field("_id2:logon:USERNAME", "MYUSER");
$mech->field("_id2:logon:PASSWORD", "MYPWD");
$mech->field("_id2:logon:AUTH_TYPE", "secEnterprise");
$mech->click;
print $mech->content();
When I run it, I don't get any errors but the output I get is the login page again. Even more puzzling, it doesn't seem to be accepting the field values I send it (the output would display default values instead of the values I assign it). Putting in a wrong user or password doesn't change anything - no error but I just get the login page back with default values
I think the script itself is fine since I changed the necessary fields and I was able to log in to our Nagios page (the output page definitely shows Nagios details). I think the CMC page is not so simple, but I need help in figuring out how to make it work.
What I've tried:
1
use Data::Dumper;
print $mech->forms;
print Dumper($mech->forms());
What that gave me is:
Current form is: WWW::Mechanize=HASH(0x243d828)
Part of the Dumper output is:
'attr' => {
'target' => 'servletBridgeIframe',
'style' => 'display:none;',
'method' => 'post'
},
'inputs' => []
I'm showing just that part of the Dumper output because it seems that's the relevant part. When I tried the same thing with our Nagios page, the 'attr' section had a 'name' field which the above doesn't. The Nagios page also had entries for 'inputs' such as 'useralias' and 'password' but the above doesn't have any entries.
2
$mech->form_number(1);
Since I wasn't sure I was referencing the form correctly, I just had it try using the first form it finds (the page only has one form anyway). My result was the same - no error and the output is the login page with default values.
3
I messed around with escaping (with '\') the underscore (_) and colon (:) in the field names.
I've searched and didn't find anything that said I had to escape any characters but it was worth a shot. All I know is, the Nagios page field names only contained letters and it worked.
I got field names from Chrome's developer tool. For example, the User Name form field showed:
<input type="text" id="_id2:logon:USERNAME" name="_id2:logon:USERNAME" value="Administrator">
I don't know if Mechanize has a problem with names starting with underscore or names containing colons.
4
$mech->click("_id2:logon:logonButton");
Since I wasn't sure the "Log On" button was being clicked I tried to specify it but it gave me an error:
No clickable input with name _id2:logon:logonButton at /usr/share/perl5/WWW/Mechanize.pm line 1676
That's probably because there is no name defined on the button (I used the id instead) but I thought it was worth a shot. Here's the code of the button:
<input type="submit" id="_id2:logon:logonButton" value="Log On" class="logonButtonNoHover logon_button_no_hover" onmouseover="this.className = 'logonButtonHover logon_button_hover';" onmouseout="this.className = 'logonButtonNoHover logon_button_no_hover';">
There's only one button on the form anyway so I shouldn't have needed to specify it (I didn't need to for the Nagios page)
5
The interactive shell of Mechanize
Here's the output when I tried to retrieve all forms on the page:
$ perl -MWWW::Mechanize::Shell -eshell
(no url)>get http://myserver:8080/BOE/CMC
Retrieving http://myserver:8080/BOE/CMC(200)
http://myserver:8080/BOE/CMC>forms
Form [1]
POST http://myserver:8080/BOE/CMC/1412201223/admin/logon.faces
Help!
I don't really know perl so I don't know how to troubleshoot this further - especially since I'm not seeing errors. If someone can direct me to other things to try, it would be helpful.
In this age of DOM and Javascript, there's lots of things that can go wrong with Web automation. From your results, it looks like maybe the form is built in browser space, which can be really hard to deal with programmatically.
The way to be sure is to dump the original response and look at the form code it contains.
If that turns out to be your problem, your simplest recourse is something like Mozilla::Mechanize.
When dealing with forms, it can sometimes be easier to replicate the request the form generates than to try to work with the form through Mechanize.
Try using your browser's developer tools to monitor what happens when you log into the site manually (in Firefox or Chrome it'll be under the Network tab), and then generate the same request with Mechanize.
For example, the resulting code MIGHT look something like:
my $post_data => {
'_id2:logon:CMS' => "MYSERVER:6400",
'_id2:logon:SAP_SYSTEM' => "",
'_id2:logon:SAP_CLIENT' => "",
'_id2:logon:USERNAME' => "MYUSER",
'_id2:logon:PASSWORD' => "MYPWD",
'_id2:logon:AUTH_TYPE' => "secEnterprise",
};
$mech->post($url, $post_data);
unless ($mech->success()){
warn "Failed to post to $url: " . $mech->response()->status_line() . "\n";
}
print $mech->content();
Where %post_data should match exactly the data that's passed in the manual post to the site and not just what's in the HTML--the keys or data could be transformed by javascript before the actual post is made.
I had someone more knowledgeable than me give me help. The main hurdle was how the page was constructed in frames and how it operated. Here are the details:
The URL of the frame that contained the login page is "http://myserver:8080/BOE/CMC/0000000000/myuser/logon.faces". The main frame of the page had a form in it, but it wasn't the logon form, which explains why the form from my original code didn't have the logon fields I was expecting.
The other "gotcha" that I ran into was that after a successful logon, the site redirects you to a different URL: "http://myserver:8080/BOE/CMC/0000000000/myuser/App/home.faces?service=%2Fmyuser%2FApp%2F". So to check a successful login, I had to get this URL and check for whatever text I decided to look for.
I also had to refer to the logon form by id and not by name (since the form did not have a name).
Here's the working code:
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get("http://myserver:8080/BOE/CMC/0000000000/myuser/logon.faces");
$mech->form_id("_id2");
$mech->field("_id2:logon:CMS", "MYSERVER:6400");
$mech->field("_id2:logon:SAP_SYSTEM", "");
$mech->field("_id2:logon:SAP_CLIENT", "");
$mech->field("_id2:logon:USERNAME", "MyUser");
$mech->field("_id2:logon:PASSWORD", "MyPwd");
$mech->field("_id2:logon:AUTH_TYPE", "secEnterprise");
$mech->click;
$mech->get("http://myserver:8080/BOE/CMC/0000000000/myuser/App/home.faces?service=%2Fmyuser%2FApp%2FappService.jsp&appKind=CMC");
$output_page = $mech->content();
if (index($output_page, "Welcome:") != -1)
{
print "\n\n+++++ Successful login! ++++++\n\n";
}
else
{
print "\n\n----- Login failed!-----\n\n";
}
For validating that I had successfully logged in, I kept it very simple and just searched for the "Welcome:" text (as in "Welcome: MyUser").

WWW::Mechanize - how to POST without affecting page stack and/or current HTML::Form object?

I'm using WWW::Mechanize to automate placing 'orders' in our supplier's portal (with permission). It's pretty straight-forward to do so by filling the relevant form fields and submiting as normal.
However, the portal has been built with a JavaScript-capable client in mind, and some short-cuts were taken as a result; the most significant short-cut they took is that as you progress through a "wizard" (series of forms) with normal POSTS, they require that you "deallocate" some resources server side for the "previous wizard step" by doing an AJAX POST. In pseudo-code:
GET page #1
fill the fields of page #1
POST (submit the form) --> redirects to page #2.
POST (ajax request to "/delete.do")
fill the fields of page #2
POST (submit the form) --> redirects to page #3.
POST (ajax request to "/delete.do")
fill the fields of page #3.
...
What's the easiest approach to do those ajax request to "/delete.do" requests?
I've tried...
Appoach (A)
If I inject a new form (referencing /delete.do in the action) into the DOM and use submit then the mech object will no longer have the HTML::Form object built from the previous redirects to page #X step.
If I just use back() from that point, does that make another GET to the server? (or is it just using the prior values from the page stack?)
Approach (B)
If I just use the post() method inherited from LWP::UserAgent to send the POST to /delete.do I get a security error -- I guess it's not using the cookie jar that has been set up by WWW::Mechanize.
Is there some canonical way to make an 'out-of-band' POST that:
Does use/update WWW::Mechanize's cookie jar
Does follow redirects
Doesn't alter the page stack
Doesn't alter the current HTML::Form object
UDPATE:
For anyone trying to replicate the solution suggested by gangabass, you actually need to:
(1) Subclass WWW::Mechanize, overriding update_html such that a
new content can be injected into the HTML on demand.
This content would normally be parsed by HTML::Form::parse(). The
override sub needs to alter the first non-self parameter $html
before calling the original implementation and returning the result.
package WWW::Mechanize::Debug;
use base 'WWW::Mechanize';
sub update_html {
my ($self,$html) = #_;
$html = $WWW::Mechanize::Debug::html_updater->($html)
if defined($WWW::Mechanize::Debug::html_updater);
return $self->SUPER::update_html($html);
}
1;
(2) In the main program, use WWW::Mechanize::Debug as per
WWW::Mechanize
use WWW::Mechanize::Debug;
my $mech = WWW::Mechanize::Debug->new;
(3) Inject the HTML form which will need to be submit()ed.
{
my $formHTML = qq|
<form action="/delete.do" method="POST" name="myform">
<!-- some relevant hidden inputs go here in the real solution -->
</form>
|;
local $WWW::Mechanize::html_updater = sub {
my ($html) = #_;
$html =~ s|</body>|$formHTML</body>|;
};
# Load the page containing the normal wizard step content.
$mech->get($the_url);
# This should how have an extra form injected into it.
}
(4) In a new scope clone() the mechanize object, fill the form and
submit it!
{
my $other = $mech->clone;
my $myform = $separate->form_name('my_form');
$myform->field('foo' => 'bar'); # fill in the relevant fields to be posted
$myform->submit;
}
(5) Continue using the original mechanize object as if that form
submission had never occurred.
You need to clone your Mech object and make POST from cloned version. Something like:
{
my $mech = $mech->clone();
$mech->post(....);
}
But of course it will be better to make sub for this.

How do I use and debug WWW::Mechanize?

I am very new to Perl and i am learning on the fly while i try to automate some projects for work. So far its has been a lot of fun.
I am working on generating a report for a customer. I can get this report from a web page i can access.
First i will need to fill a form with my user name, password and choose a server from a drop down list, and log in.
Second i need to click a link for the report section.
Third a need to fill a form to create the report.
Here is what i wrote so far:
my $mech = WWW::Mechanize->new();
my $url = 'http://X.X.X.X/Console/login/login.aspx';
$mech->get( $url );
$mech->submit_form(
form_number => 1,
fields =>{
'ctl00$ctl00$cphVeriCentre$cphLogin$txtUser' => 'someone',
'ctl00$ctl00$cphVeriCentre$cphLogin$txtPW' => '12345',
'ctl00$ctl00$cphVeriCentre$cphLogin$ddlServers' => 'Live',
button => 'Sign-In'
},
);
die unless ($mech->success);
$mech->dump_forms();
I dont understand why, but, after this i look at the what dump outputs and i see the code for the first login page, while i belive i should have reached the next page after my successful login.
Could there be something with a cookie that can effect me and the login attempt?
Anythings else i am doing wrong?
Appreciate you help,
Yaniv
This is several months after the fact, but I resolved the same issue based on a similar questions I asked. See Is it possible to automate postback from the client side? for more info.
I used Python's Mechanize instead or Perl, but the same principle applies.
Summarizing my earlier response:
ASP.NET pages need a hidden parameter called __EVENTTARGET in the form, which won't exist when you use mechanize normally.
When visited by a normal user, there is a __doPostBack('foo') function on these pages that gives the relevant value to __EVENTTARGET via a javascript onclick event on each of the links, but since mechanize doesn't use javascript you'll need to set these values yourself.
The python solution is below, but it shouldn't be too tough to adapt it to perl.
def add_event_target(form, target):
#Creates a new __EVENTTARGET control and adds the value specified
#.NET doesn't generate this in mechanize for some reason -- suspect maybe is
#normally generated by javascript or some useragent thing?
form.new_control('hidden','__EVENTTARGET',attrs = dict(name='__EVENTTARGET'))
form.set_all_readonly(False)
form["__EVENTTARGET"] = target
You can only mechanize stuff that you know. Before you write any more code, I suggest you use a tool like Firebug and inspect what is happening in your browser when you do this manually.
Of course there might be cookies that are used. Or maybe your forgot a hidden form parameter? Only you can tell.
EDIT:
WWW::Mechanize should take care of cookies without any further intervention.
You should always check whether the methods you called were successful. Does the first get() work?
It might be useful to take a look at the server logs to see what is actually requested and what HTTP status code is sent as a response.
If you are on Windows, use Fiddler to see what data is being sent when you perform this process manually, and then use Fiddler to compare it to the data captured when performed by your script.
In my experience, a web debugging proxy like Fiddler is more useful than Firebug when inspecting form posts.
I have found it very helpful to use Wireshark utility when writing web automation with WWW::Mechanize. It will help you in few ways:
Enable you realize whether your HTTP request was successful or not.
See the reason of failure on HTTP level.
Trace the exact data which you pass to the server and see what you receive back.
Just set an HTTP filter for the network traffic and start your Perl script.
The very short gist of aspx pages it that they hold all of the local session information within a couple of variables prefixed by "__" in the general aspxform. Usually this is a top level form and all form elements will be part of it, but I guess that can vary by implementation.
For the particular implementation I was dealing with I needed to worry about 2 of these state variables, specifically:
__VIEWSTATE
__EVENTVALIDATION.
Your goal is to make sure that these variables are submitted into the form you are submitting, since they might be part of that main form aspxform that I mentioned above, and you are probably submitting a different form than that.
When a browser loads up an aspx page a piece of javascript passes this session information along within the asp server/client interaction, but of course we don't have that luxury with perl mechanize, so you will need to manually post these yourself by adding the elements to the current form using mechanize.
In the case that I just solved I basically did this:
my $browser = WWW::Mechanize->new( );
# fetch the login page to get the initial session variables
my $login_page = 'http://www.example.com/login.aspx';
$response = $browser->get( $login_page);
# very short way to find the fields so you can add them to your post
$viewstate = ($browser->find_all_inputs( type => 'hidden', name => '__VIEWSTATE' ))[0]->value;
$validation = ($browser->find_all_inputs( type => 'hidden', name => '__EVENTVALIDATION' ))[0]->value;
# post back the formdata you need along with the session variables
$browser->post( $login_page, [ username => 'user', password => 'password, __VIEWSTATE => $viewstate, __EVENTVALIDATION => $validation ]);
# finally get back the content and make sure it looks right
print $response->content();