How can I fill in web forms with Perl? - perl

I want to fill in a web form with Perl. I am having trouble finding out the correct syntax to accomplish this. As in, how do I go to the URL, select the form, fill in the form, and then press enter to be sure it has been submitted?

Something like WWW::Mechanize::FormFiller?

WWW::Mechanize and its friends are the way to go. There are several examples in Spidering Hacks, but you'll also find plenty more by googling for the module name.
Good luck, :)

Start with WWW::Mechanize::Shell:
perl -MWWW::Mechanize::Shell -e shell
get http://some/page
fillout
...
submit
Afterwards, type "script", and save generated code as something.pl - and that's about it. It's done.

Request the form's action URL with Net::HTTP or something (can't recall the exact module), and include the forms fields as a GET/POST parameter (whichever the form calls for).

HTML::Form works nicely, too.
The synopsis of the module is an excellent example:
use HTML::Form;
$form = HTML::Form->parse($html, $base_uri);
$form->value(query => "Perl");
use LWP::UserAgent;
$ua = LWP::UserAgent->new;
$response = $ua->request($form->click);

Related

perl WWW::Mechanize can't seem to find the right form or assign fields or click submit

So I'm trying to create a perl script that logs in to SAP BusinessObjects Central Management Console (CMC) page but it doesn't even look like it's finding the right form or finding the right field or even clicking Submit.
Here's my code:
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get("http://myserver:8080/BOE/CMC");
$mech->form_name("_id2");
$mech->field("_id2:logon:CMS", "MYSERVER:6400");
$mech->field("_id2:logon:SAP_SYSTEM", "");
$mech->field("_id2:logon:SAP_CLIENT", "");
$mech->field("_id2:logon:USERNAME", "MYUSER");
$mech->field("_id2:logon:PASSWORD", "MYPWD");
$mech->field("_id2:logon:AUTH_TYPE", "secEnterprise");
$mech->click;
print $mech->content();
When I run it, I don't get any errors but the output I get is the login page again. Even more puzzling, it doesn't seem to be accepting the field values I send it (the output would display default values instead of the values I assign it). Putting in a wrong user or password doesn't change anything - no error but I just get the login page back with default values
I think the script itself is fine since I changed the necessary fields and I was able to log in to our Nagios page (the output page definitely shows Nagios details). I think the CMC page is not so simple, but I need help in figuring out how to make it work.
What I've tried:
1
use Data::Dumper;
print $mech->forms;
print Dumper($mech->forms());
What that gave me is:
Current form is: WWW::Mechanize=HASH(0x243d828)
Part of the Dumper output is:
'attr' => {
'target' => 'servletBridgeIframe',
'style' => 'display:none;',
'method' => 'post'
},
'inputs' => []
I'm showing just that part of the Dumper output because it seems that's the relevant part. When I tried the same thing with our Nagios page, the 'attr' section had a 'name' field which the above doesn't. The Nagios page also had entries for 'inputs' such as 'useralias' and 'password' but the above doesn't have any entries.
2
$mech->form_number(1);
Since I wasn't sure I was referencing the form correctly, I just had it try using the first form it finds (the page only has one form anyway). My result was the same - no error and the output is the login page with default values.
3
I messed around with escaping (with '\') the underscore (_) and colon (:) in the field names.
I've searched and didn't find anything that said I had to escape any characters but it was worth a shot. All I know is, the Nagios page field names only contained letters and it worked.
I got field names from Chrome's developer tool. For example, the User Name form field showed:
<input type="text" id="_id2:logon:USERNAME" name="_id2:logon:USERNAME" value="Administrator">
I don't know if Mechanize has a problem with names starting with underscore or names containing colons.
4
$mech->click("_id2:logon:logonButton");
Since I wasn't sure the "Log On" button was being clicked I tried to specify it but it gave me an error:
No clickable input with name _id2:logon:logonButton at /usr/share/perl5/WWW/Mechanize.pm line 1676
That's probably because there is no name defined on the button (I used the id instead) but I thought it was worth a shot. Here's the code of the button:
<input type="submit" id="_id2:logon:logonButton" value="Log On" class="logonButtonNoHover logon_button_no_hover" onmouseover="this.className = 'logonButtonHover logon_button_hover';" onmouseout="this.className = 'logonButtonNoHover logon_button_no_hover';">
There's only one button on the form anyway so I shouldn't have needed to specify it (I didn't need to for the Nagios page)
5
The interactive shell of Mechanize
Here's the output when I tried to retrieve all forms on the page:
$ perl -MWWW::Mechanize::Shell -eshell
(no url)>get http://myserver:8080/BOE/CMC
Retrieving http://myserver:8080/BOE/CMC(200)
http://myserver:8080/BOE/CMC>forms
Form [1]
POST http://myserver:8080/BOE/CMC/1412201223/admin/logon.faces
Help!
I don't really know perl so I don't know how to troubleshoot this further - especially since I'm not seeing errors. If someone can direct me to other things to try, it would be helpful.
In this age of DOM and Javascript, there's lots of things that can go wrong with Web automation. From your results, it looks like maybe the form is built in browser space, which can be really hard to deal with programmatically.
The way to be sure is to dump the original response and look at the form code it contains.
If that turns out to be your problem, your simplest recourse is something like Mozilla::Mechanize.
When dealing with forms, it can sometimes be easier to replicate the request the form generates than to try to work with the form through Mechanize.
Try using your browser's developer tools to monitor what happens when you log into the site manually (in Firefox or Chrome it'll be under the Network tab), and then generate the same request with Mechanize.
For example, the resulting code MIGHT look something like:
my $post_data => {
'_id2:logon:CMS' => "MYSERVER:6400",
'_id2:logon:SAP_SYSTEM' => "",
'_id2:logon:SAP_CLIENT' => "",
'_id2:logon:USERNAME' => "MYUSER",
'_id2:logon:PASSWORD' => "MYPWD",
'_id2:logon:AUTH_TYPE' => "secEnterprise",
};
$mech->post($url, $post_data);
unless ($mech->success()){
warn "Failed to post to $url: " . $mech->response()->status_line() . "\n";
}
print $mech->content();
Where %post_data should match exactly the data that's passed in the manual post to the site and not just what's in the HTML--the keys or data could be transformed by javascript before the actual post is made.
I had someone more knowledgeable than me give me help. The main hurdle was how the page was constructed in frames and how it operated. Here are the details:
The URL of the frame that contained the login page is "http://myserver:8080/BOE/CMC/0000000000/myuser/logon.faces". The main frame of the page had a form in it, but it wasn't the logon form, which explains why the form from my original code didn't have the logon fields I was expecting.
The other "gotcha" that I ran into was that after a successful logon, the site redirects you to a different URL: "http://myserver:8080/BOE/CMC/0000000000/myuser/App/home.faces?service=%2Fmyuser%2FApp%2F". So to check a successful login, I had to get this URL and check for whatever text I decided to look for.
I also had to refer to the logon form by id and not by name (since the form did not have a name).
Here's the working code:
use strict;
use warnings;
use WWW::Mechanize;
use HTTP::Cookies;
my $mech = WWW::Mechanize->new();
$mech->cookie_jar(HTTP::Cookies->new());
$mech->get("http://myserver:8080/BOE/CMC/0000000000/myuser/logon.faces");
$mech->form_id("_id2");
$mech->field("_id2:logon:CMS", "MYSERVER:6400");
$mech->field("_id2:logon:SAP_SYSTEM", "");
$mech->field("_id2:logon:SAP_CLIENT", "");
$mech->field("_id2:logon:USERNAME", "MyUser");
$mech->field("_id2:logon:PASSWORD", "MyPwd");
$mech->field("_id2:logon:AUTH_TYPE", "secEnterprise");
$mech->click;
$mech->get("http://myserver:8080/BOE/CMC/0000000000/myuser/App/home.faces?service=%2Fmyuser%2FApp%2FappService.jsp&appKind=CMC");
$output_page = $mech->content();
if (index($output_page, "Welcome:") != -1)
{
print "\n\n+++++ Successful login! ++++++\n\n";
}
else
{
print "\n\n----- Login failed!-----\n\n";
}
For validating that I had successfully logged in, I kept it very simple and just searched for the "Welcome:" text (as in "Welcome: MyUser").

How can I check my post data in Zend?

I am a beginner and I am creating some forms to be posted into MySQL using Zend, and I am in the process of debugging but I don't really know how to debug anything using Zend. I want to submit the form and see if my custom forms are concatenating the data properly before it goes into MySQL, so I want to catch the post data to see a few things. How can I do this?
The Default route for zend framework application looks like the following
http://www.name.tld/$controller/$action/$param1/$value1/.../$paramX/$valueX
So all $_GET-Parameters simply get contenated onto the url in the above manner /param/value
Let's say you are within IndexController and indexAction() in here you call a form. Now there's possible two things happening:
You do not define a Form-Action, then you will send the form back to IndexController:indexAction()
You define a Form action via $form->setAction('/index/process') in that case you would end up at IndexController:processAction()
The way to access the Params is already defined above. Whereas $this->_getParam() equals $this->getRequest()->getParam() and $this->_getAllParams() equals $this->getRequest->getParams()
The right way yo check data of Zend Stuff is using Zend_Debug as #vascowhite has pointed out. If you want to see the final Query-String (in case you're manually building queries), then you can simply put in the insert variable into Zend_Debug::dump()
you can use $this->_getAllParams();.
For example: var_dump($this->_getAllParams()); die; will output all the parameters ZF received and halt the execution of the script. To be used in your receiving Action.
Also, $this->_getParam("param name"); will get a specific parameter from the request.
The easiest way to check variables in Zend Framework is to use Zend_Debug::dump($variable); so you can do this:-
Zend_Debug::dump($_POST);
Zend framework is built on the top of the PHP . so you can use var_dump($_POST) to check the post variables.
ZF has provided its own functions to get all the post variables.. Zend_Debug::dump($this->getRequest()->getPost())
or specifically for one variable.. you can use Zend_Debug::dump($this->getRequest()->getPost($key))
You can check post data by using zend
$request->isPost()
and for retrieving post data
$request->getPost()
For example
if ($request->isPost()) {
$postData = $request->getPost();
Zend_Debug::dump($postData );
}

How to pass $_FILES from Perl to PHP

I have to interact with a PHP script that receives file uploads using a form. Looking at the code it loops through an array called $_FILES. I need to be able to post to this form using Perl and would like to ask what would be the best way to pass the file names ? Would I use something like WWW:Mechanize ?
It sounds like your question is:
How do I simulate submitting a form (with a file input) using Perl?
The fact that it is handled with PHP on the backend is irrelevant to the problem.
Note that you need to submit actual files not file names.
WWW::Mechanize is an option, I'd probably use LWP::UserAgent myself, it makes use of HTTP::Request::Common which allows you to select files to upload by passing an arrayref instead of a string.
[ name => 'Gisle Aas',
email => 'gisle#aas.no',
gender => 'M',
born => '1964',
init => ["$ENV{HOME}/.profile"],
]
https://metacpan.org/pod/PHP::Interpreter#TYPE-HANDLING
Quick google search, allows PHP and Perl code to interact with one another.

How can I access parameters passed in the URL when a form is POSTed to my script?

I've run into an issue with mod_rewrite when submitting forms to our site perl scripts. If someone does a GET request on a page with a url such as http://www.example.com/us/florida/page-title, I rewrite that using the following rewrite rule which works correctly:
RewriteRule ^us/(.*)/(.*)$ /cgi-bin/script.pl?action=Display&state=$1&page=$2 [NC,L,QSA]
Now, if that page had a form on it I'd like to do a form post to the same url and have Mod Rewrite use the same rewrite rule to call the same script and invoke the same action. However, what's happening is that the rewrite rule is being triggered, the correct script is being called and all form POST variables are being posted, however, the rewritten parameters (action, state & page in this example) aren't being passed to the Perl script. I'm accessing these variables using the same Perl code for both the GET and POST requests:
use CGI;
$query = new CGI;
$action = $query->param('action');
$state = $query->param('state');
$page = $query->param('page');
I included the QSA flag since I figured that might resolve the issue but it didn't. If I do a POST directly to the script URL then everything works correctly. I'd appreciate any help in figuring out why this isn't currently working. Thanks in advance!
If you're doing a POST query, you need to use $query->url_param('action') etc. to get parameters from the query string. You don't need or benefit from the QSA modifier.
Change your script to:
use CGI;
use Data::Dumper;
my $query = CGI->new; # even though I'd rather call the object $cgi
print $query->header('text/plain'), Dumper($query);
and take a look at what is being passed to your script and update your question with that information.

How do I use and debug WWW::Mechanize?

I am very new to Perl and i am learning on the fly while i try to automate some projects for work. So far its has been a lot of fun.
I am working on generating a report for a customer. I can get this report from a web page i can access.
First i will need to fill a form with my user name, password and choose a server from a drop down list, and log in.
Second i need to click a link for the report section.
Third a need to fill a form to create the report.
Here is what i wrote so far:
my $mech = WWW::Mechanize->new();
my $url = 'http://X.X.X.X/Console/login/login.aspx';
$mech->get( $url );
$mech->submit_form(
form_number => 1,
fields =>{
'ctl00$ctl00$cphVeriCentre$cphLogin$txtUser' => 'someone',
'ctl00$ctl00$cphVeriCentre$cphLogin$txtPW' => '12345',
'ctl00$ctl00$cphVeriCentre$cphLogin$ddlServers' => 'Live',
button => 'Sign-In'
},
);
die unless ($mech->success);
$mech->dump_forms();
I dont understand why, but, after this i look at the what dump outputs and i see the code for the first login page, while i belive i should have reached the next page after my successful login.
Could there be something with a cookie that can effect me and the login attempt?
Anythings else i am doing wrong?
Appreciate you help,
Yaniv
This is several months after the fact, but I resolved the same issue based on a similar questions I asked. See Is it possible to automate postback from the client side? for more info.
I used Python's Mechanize instead or Perl, but the same principle applies.
Summarizing my earlier response:
ASP.NET pages need a hidden parameter called __EVENTTARGET in the form, which won't exist when you use mechanize normally.
When visited by a normal user, there is a __doPostBack('foo') function on these pages that gives the relevant value to __EVENTTARGET via a javascript onclick event on each of the links, but since mechanize doesn't use javascript you'll need to set these values yourself.
The python solution is below, but it shouldn't be too tough to adapt it to perl.
def add_event_target(form, target):
#Creates a new __EVENTTARGET control and adds the value specified
#.NET doesn't generate this in mechanize for some reason -- suspect maybe is
#normally generated by javascript or some useragent thing?
form.new_control('hidden','__EVENTTARGET',attrs = dict(name='__EVENTTARGET'))
form.set_all_readonly(False)
form["__EVENTTARGET"] = target
You can only mechanize stuff that you know. Before you write any more code, I suggest you use a tool like Firebug and inspect what is happening in your browser when you do this manually.
Of course there might be cookies that are used. Or maybe your forgot a hidden form parameter? Only you can tell.
EDIT:
WWW::Mechanize should take care of cookies without any further intervention.
You should always check whether the methods you called were successful. Does the first get() work?
It might be useful to take a look at the server logs to see what is actually requested and what HTTP status code is sent as a response.
If you are on Windows, use Fiddler to see what data is being sent when you perform this process manually, and then use Fiddler to compare it to the data captured when performed by your script.
In my experience, a web debugging proxy like Fiddler is more useful than Firebug when inspecting form posts.
I have found it very helpful to use Wireshark utility when writing web automation with WWW::Mechanize. It will help you in few ways:
Enable you realize whether your HTTP request was successful or not.
See the reason of failure on HTTP level.
Trace the exact data which you pass to the server and see what you receive back.
Just set an HTTP filter for the network traffic and start your Perl script.
The very short gist of aspx pages it that they hold all of the local session information within a couple of variables prefixed by "__" in the general aspxform. Usually this is a top level form and all form elements will be part of it, but I guess that can vary by implementation.
For the particular implementation I was dealing with I needed to worry about 2 of these state variables, specifically:
__VIEWSTATE
__EVENTVALIDATION.
Your goal is to make sure that these variables are submitted into the form you are submitting, since they might be part of that main form aspxform that I mentioned above, and you are probably submitting a different form than that.
When a browser loads up an aspx page a piece of javascript passes this session information along within the asp server/client interaction, but of course we don't have that luxury with perl mechanize, so you will need to manually post these yourself by adding the elements to the current form using mechanize.
In the case that I just solved I basically did this:
my $browser = WWW::Mechanize->new( );
# fetch the login page to get the initial session variables
my $login_page = 'http://www.example.com/login.aspx';
$response = $browser->get( $login_page);
# very short way to find the fields so you can add them to your post
$viewstate = ($browser->find_all_inputs( type => 'hidden', name => '__VIEWSTATE' ))[0]->value;
$validation = ($browser->find_all_inputs( type => 'hidden', name => '__EVENTVALIDATION' ))[0]->value;
# post back the formdata you need along with the session variables
$browser->post( $login_page, [ username => 'user', password => 'password, __VIEWSTATE => $viewstate, __EVENTVALIDATION => $validation ]);
# finally get back the content and make sure it looks right
print $response->content();