Why can't WWW::Mechanize find the right form? - perl

I'm using WWW::Mechanize to retrieve a form from a webpage:
#!/usr/bin/perl
use WWW::Mechanize;
my $mechanize = WWW::Mechanize->new();
$mechanize->proxy(['http', 'ftp'], 'http://proxy/');
$mechanize->get("http://www.temp.com/");
$mechanize->form_id('signin');
The website HTML has code as follows
<form action="https://www.temp.com/session" id="signin" method="post">
but I get the error
There is no form with ID "signin" at SiteScraper.pl
What do I do?

Without knowing exactly could be wrong, you might try to output whatever forms that WWW::Mechanize is able to find in the response by using:
use Data::Dumper;
print Dumper($mechanize->forms());
It should output all the forms and their respective attributes etc.
Double check that the form is in the dump, otherwise something is wrong. Then check that the form's ->{attr}->{id} is what you expect as well.
You can also try to select the form using another way, e.g. by name, and see if that helps.

Related

How to convert txt file to html using perl

I have run some scripts to generate storage capacity report and configuration settings report using perl. I would like send this report to my mail id in html using perl.
Please note that I am new to perl programming.
Your question is very vague, so it's hard to be of much help. This might point you in the right direction though.
You basically have three tasks here.
Parse your report into data structures.
Use your data structures to generate an HTML document.
Send the HTML document by email.
I can't really help with step 1 as I know nothing about your report. If you have your file in CSV format, then Text:CSV will be useful to you. It's worth pointing out that if you're generating this report, then you could generate it in a format that is easier to parse - JSON, for example.
For step 2, I'd recommend a templating engine. I'd use the Template Toolkit, but other options are available. The idea is that you create a template file that contains all of your HTML with "tags" where you want your variable data to go. On a simple level it might look something like this:
<html>
<head><title>Some Title</title></head>
<body>
<h1>Some Title</h1>
<p>Blah...</p>
<table>
[% FOREACH row IN data -%]
<tr><td>[% row.value %]</td><td>[% row.another_value %]</td></tr>
[% END -%]
<table>
</body>
</html>
Assuming that's in a file called email.tt and you have your data in an array of hashes called #data, then you'd process the template like this:
use Template;
#data = ({
value => 'something',
another_value => 'something else',
}, {
value => 'something',
another_value => 'something else',
});
my $tt = Template->new;
$tt->process('email.tt', { data => \#data }, \$email_body)
or die $tt->error;
That will give you your expanded HTML in $email_body. And that brings us to step 3.
I recommend Email::Stuffer for sending email.
use Email::Stuffer;
Email::Stuffer->from ('you#example.com')
->to ('someone_else#example.com')
->html_body($email_body)
->send;

going to a javascript link with mechanize-firefox

There is a link on a page, and I want to go to it but it just a javascript command. How with mechanize do I go to the link?
<span>abc</span>
Without the page and its HTML and JS one can only guess. Note that the follow_link() methods don't work with JS links. The method below does, but of course I cannot test without the page.
Probably the best bet is to get link(s) as DOM object(s) for the click method
use WWW::Mechanize::Firefox;
# Get to your page with the link(s)
my $link = find_link_dom( text_regex => 'abc' ); # Or use find_all_links_dom()
$link->click();
# $mech->click( { dom => $link } ) # works as well
There are also text and text_contains relevant options (instead of text_regex), along with a number of others. Note that click method will wait, on a list of events, before returning. See for example this recent post. This is critical for pages that take longer to complete.
See docs for find_link_dom() and click methods. They aren't very detailed or rich in examples but do provide enough to play with and figure it out.
If you need to interrogate links use find_all_links_dom(), which returns an array or a reference to array (depending on context) of Firefox's DOM as MozRepl::RemoteObject instances.
my #links_dom = find_all_links_dom( text_contains => 'abc' );
# Example from docs for find_link_dom()
for my $ln (#links_dom) {
print $ln->{innerHTML} . "\n"
}
See the page for MozRepl::RemoteObject to see what you can do with it. If you only need to find out which link to click the options for find_link_dom() should be sifficient.
This has been tested only with a toy page, that uses __doPostBack link, with <span> in the link.

How to pass a list of image filenames from Docpad to browser client

I want to get a list of filenames from a directory in my Docpad project and then pass them to the client for preloading. What is the best way to do this?
What I have been trying is to extract a list of file names from the directory and then pass them to the client via the document.
So I have a collection, like so in docpad.coffee:
collections:
myImages: ->
#getFilesAtPath({relativeOutDirPath: 'images'})
And then in the footer of my html, I've been trying something like this:
<script>
var images = <%= #getCollection('myImages').toJSON() %>
</script>
This however is not coming even close to working. What I really want to do is have images set to an array of urls pointing to the files. But I can't seem to figure out how this would be done. The Docpad documentation and the Query-Engine documentation are simply to sparse.
Anyone have any ideas? Or is there a totally different way to think about this? Is there a way to hand over a variable directly from the Node/Docpad backend to the client, by passing the need to pass it along with the HTML?
Not sure why you want to separate it as a script (btw if you want you can really put it in a separate script if it is called js.eco). I guess it is just for clear separation.
This is how it will work:
<script>
var images = [];
<%for obj in #getCollection("myImages").toJSON(): %>
images.push('<%= obj.url %>');
<% end %>
console.log(images[0]);
</script>
So you will have an array with url's only. I've put there an info to print out in console first element to show it works.

How do I use Perl's Remote::Selenium::WebElement to verify the URL a hyperlink will take me to?

Seems like it should be straightforward but I can't seem to get to the bottom of it.
Here's the HTML I'm working with:
<li id="a" class="FILElevel3" onclick="changeMenu("b")">
<a onclick="stopBubble(event);" href="javascript:LinkPopup('/sub/URL.html')">Visible Text</a>
I'm able to find the element using XPaths:
my $returned_asset = $sel->find_element("//*[\#class='LINKlevel3']");
And I can verify this works because I'm able to extract the visible text from it:
my $returned_name = Selenium::Remote::WebElement::get_text($returned_asset);
I just can't seem to find the sequence to pull the HREF attribute from the element to put the link's URL into a verifiable string. Should I be able to do this using WebElement's get_attribute() method? I've tried variations on this:
my $returned_URL = $returned_asset-> Selenium::Remote::WebElement::get_attribute("a/href");
...where I've plugged in everything I could think of for that "a/href" string. What should go in there?
In the end I'd like to be able to put "javascript:LinkPopup('/sub/URL.html')" into a string and verify that my URL is in there.
have you tried
my $returned_asset = $sel->find_element("//*[\#class='LINKlevel3']/a");
my $returned_URL = $returned_asset->Selenium::Remote::WebElement::get_attribute("href");

What is my problem filling forms with LWP and HTTP::Request::Form?

I'm new to Perl, currently writing a Perl script to automatically fill web forms and submit them using LWP. The website URL is ***/something.cgi and in that document there is a form I need to fill, then hit submit. That takes me to another page which has another form to fill, but the website's URL remains the same.
I managed to fill the first form and submit it using:
$res = $ua->request($f->press("submit"));
where
my $f = HTTP::Request::Form->new($forms[0], $url);
Viewing $res->as_string shows the next page source, but tried to get the new forms in order to fill it, but it gave me the same form I already have. How can I get next page in order to fill its forms and proceed?
I would recommend you look at WWW::Mechanize and its form methods which is a subclass of LWP::UserAgent.
EDIT
Adding an example closely based on the example from my first link:
use strict;
use warnings;
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$mech->get( 'http://google.com' );
sleep 1; ## be nice
$mech->submit_form(
form_number => 0,
fields => {
q => 'mungo',
}
);
print $mech->content;
The form you are trying to script must be using some paramter or cookie to determine which page of the multi-page form to process. Look at the cookies returned in
print $res->header();
to see if there are cookies being set for a session-ID or other parameter that you need to pass back in.
Also, look at the source of the first form page vs the second, see if there are hidden input types that indicate that the second submission is for the second form page. Or, look at the value of the submit button tag, maybe that is different on the second page.