WWW::Mechanize::Firefox CSS Selector with multiple elements? - perl

When using WWW::Mechanize::Firefox to select an item, is it possible to iterate through a number of selectors that have the same name?
I use the following code:
my $un = $mech->selector('input.normal', single => 1);
The response is 2 elements found for CSS selector. Is there a way to use XPath or a better method, or is it possible to loop through the results?
Bonus point: typing into the inputs even though it is not in form elements (ie. uses JavaScript)

With the single option you have specified that there should be exactly one element that matches the selector. That is why you get an error message when it finds two matches.
The method will return a list of matches, and you can either use one => 1 in place of single => 1, which will throw van error if there isn't at least one match, or you can leave the option out altogether, when it will simply return all that it finds.
my #inputs = $mech->selector('input.normal')
will fill the array #inputs with a list of matching <input> elements, however many there are.

Module documentation contain these examples:
my $link = $mech->xpath('//a[id="clickme"]', one => 1);
# croaks if there is no link or more than one link found
my #para = $mech->xpath('//p');
# Collects all paragraphs

Related

Two-sided references in a list

I'm just starting to learn Perl and I have to do an exercise containing references.
I have to create a program, that constructs a list with two sided references, that are are received as command line arguments. At the beginning of the program, there is only one element in the list - 0. To go through the list, reference is being used, that references to the only element of the list at the moment - 0. The arguments of the command line are being read one by one and added right behind the element, that is being referenced to. When one argument is added, the reference slides one element to the right(it references to the newly added element). There are also two special elements - + and -. + allows the reference to move one element to the right, and - one element to the left. Also, it is important that the reference would not go beyond the list limit.
The output is all the arguments in the correct order of the list.
Additional requirements are that the list must be created by using hashes, that contain links to neighbouring elements. Also, I cannot use arrays to store the whole list.
There are a few examples to make it easier to grasp the concept, this is the most useful one:
./3.pl A D - B C + E
0 A B C D E
All I've got now is just the start of the program, it is nowhere near done and doesn't compile, but I can't figure out where to go from there. I've tried looking for some information about two-sided references(I'm not sure if I'm translating it correctly), but I can't seem to find anything. Any information about two-sided references or any tips how to start writing this program properly would be very appreciated.
My code:
#!/usr/bin/perl
use strict;
use warnings;
my $A= {
value=>'0',
prev=>'undef',
next=>'$B'
};
my $B= {
value=>'0',
prev=>'$A',
next=>'$C'
};
my $C= {
value=>'0',
prev=>'$B',
next=>'undef'
};
for my $smbl(0..#$ARGV) {
$A-> {value} = $ARGV[$smbl];
$Α-> {next} = $ARGV[$smbl+1];
}
First of all, what you are building is called a doubly linked list.
Let me tell you the biggest trick for working with linked lists: Create a dummy "head" node and a dummy "tail" node. You won't print their values, but having them will greatly reduce the number of special cases in your code, making it so much simpler!
At the core, you will have three "pointers" (references).
$head points to the first node of the list.
$tail points to the last node of the list.
$cursor initially points to the node in $tail. New nodes will be inserted before this node.
When processing +, two different situations you need to handle:
$cursor == $tail: Error! Cursor moved beyond end of list.
$cursor != $tail: Point $cursor to the node following the one it references.
When processing -, there are two different situations you need to handle:
$cursor->{prev} == $head: Error! Cursor moved beyond start of list.
$cursor->{prev} != $head: Point $cursor to the node preceding the one it references.
When processing inserting nodes, no checks need to be performed because of the dummy nodes!

Handle POST data sent as array

I have an html form which sends a hidden field and a radio button with the same name.
This allows people to submit the form without picking from the list (but records a zero answer).
When the user does select a radio button, the form posts BOTH the hidden value and the selected value.
I'd like to write a perl function to convert the POST data to a hash. The following works for standard text boxes etc.
#!/usr/bin/perl
use CGI qw(:standard);
sub GetForm{
%form;
foreach my $p (param()) {
$form{$p} = param($p);
}
return %form;
}
However when faced with two form inputs with the same name it just returns the first one (ie the hidden one)
I can see that the inputs are included in the POST header as an array but I don't know how to process them.
I'm working with legacy code so I can't change the form unfortunately!
Is there a way to do this?
I have an html form which sends a hidden field and a radio button with
the same name.
This allows people to submit the form without picking from the list
(but records a zero answer).
That's an odd approach. It would be easier to leave the hidden input out and treat the absence of the data as a zero answer.
However, if you want to stick to your approach, read the documentation for the CGI module.
Specifically, the documentation for param:
When calling param() If the parameter is multivalued (e.g. from multiple selections in a scrolling list), you can ask to receive an array. Otherwise the method will return the first value.
Thus:
$form{$p} = [ param($p) ];
However, you do seem to be reinventing the wheel. There is a built-in method to get a hash of all paramaters:
$form = $CGI->new->Vars
That said, the documentation also says:
CGI.pm is no longer considered good practice for developing web applications, including quick prototyping and small web scripts. There are far better, cleaner, quicker, easier, safer, more scalable, more extensible, more modern alternatives available at this point in time. These will be documented with CGI::Alternatives.
So you should migrate away from this anyway.
Replace
$form{$p} = param($p); # Value of first field named $p
with
$form{$p} = ( multi_param($p) )[-1]; # Value of last field named $p
or
$form{$p} = ( grep length, multi_param($p) )[-1]; # Value of last field named $p
# that has a non-blank value

JQuery Wildcard for using atttributes in selectors

I've research this topic extensibly and I'm asking as a last resort before assuming that there is no wildcard for what I want to do.
I need to pull up all the text input elements from the document and add it to an array. However, I only want to add the input elements that have an id.
I know you can use the \S* wildcard when using an id selector such as $(#\S*), however I can't use this because I need to filter the results by text type only as well, so I searching by attribute.
I currently have this:
values_inputs = $("input[type='text'][id^='a']");
This works how I want it to but it brings back only the text input elements that start with an 'a'. I want to get all the text input elements with an 'id' of anything.
I can't use:
values_inputs = $("input[type='text'][id^='']"); //or
values_inputs = $("input[type='text'][id^='*']"); //or
values_inputs = $("input[type='text'][id^='\\S*']"); //or
values_inputs = $("input[type='text'][id^=\\S*]");
//I either get no values returned or a syntax error for these
I guess I'm just looking for the equivalent of * in SQL for JQuery attribute selectors.
Is there no such thing, or am I just approaching this problem the wrong way?
Actually, it's quite simple:
var values_inputs = $("input[type=text][id]");
Your logic is a bit ambiguous. I believe you don't want elements with any id, but rather elements where id does not equal an empty string. Use this.
values_inputs = $("input[type='text']")
.filter(function() {
return this.id != '';
});
Try changing your selector to:
$("input[type='text'][id]")
I figured out another way to use wild cards very simply. This helped me a lot so I thought I'd share it.
You can use attribute wildcards in the selectors in the following way to emulate the use of '*'. Let's say you have dynamically generated form in which elements are created with the same naming convention except for dynamically changing digits representing the index:
id='part_x_name' //where x represents a digit
If you want to retrieve only the text input ones that have certain parts of the id name and element type you can do the following:
var inputs = $("input[type='text'][id^='part_'][id$='_name']");
and voila, it will retrieve all the text input elements that have "part_" in the beginning of the id string and "_name" at the end of the string. If you have something like
id='part_x_name_y' // again x and y representing digits
you could do:
var inputs = $("input[type='text'][id^='part_'][id*='_name_']"); //the *= operator means that it will retrieve this part of the string from anywhere where it appears in the string.
Depending on what the names of other id's are it may start to get a little trickier if other element id's have similar naming conventions in your document. You may have to get a little more creative in specifying your wildcards. In most common cases this will be enough to get what you need.

Dom-Processing with Perl-Mechanize: finalizing a little programme

I'm currently working on a little harvester, using this dataset of 2700 foundations. All the data are free to use with no limitations or copyright isues.
What I have so far: The harvesting task should be no problem if I take WWW::Mechanize — particularly for doing the form based search and selecting the individual entries. Hmm — I guess that the algorithm would be basically two nested loops: the outer loop runs the form-based search, the inner loop processes the search results.
The outer loop would use the select() and the submit_form() functions on the second search form on the page. Can we use DOM processing here? Well — how can we get the get the selection values.
The inner loop through the results would use the follow link function to get to the actual entries using the following call.
$mech->follow_link(url_regex => qr/webgrab_path=http:\/\/evs2000.*\?
Id=\d+$/, n => $result_nbr);
This would forward our mechanic browser to the entry page. Basically the URL query looks for links that have the webgrap_path to Id pattern, which is unique for each database entry. The $result_nbr variable tells mecha which one of the results it should follow next.
If we have several result pages we would also use the same trick to traverse through the result pages. For the semantic extraction of the entry information,we could parse the content of the actual entries with XML:LibXML's html parser (which works fine on this page), because it gives you some powerful DOM selection (using XPath) methods.
Well the actual looping through the pages should be doable in a few lines of Perl (max. 20 lines — likely less).
But wait: the processing of the entry pages will then be the most complex part
of the script.
Approaches: In principle we could do the same algorithm with a single while loop
if we use the back() function smartly.
Can you give me a hint for the beginning — the processing of the entry pages — doing this in Perl:: Mechanize?
Here's what I have:
GetThePage(
starting url
);
sub GetThePage {
my $mech ...
my #pages = ...
while(#pages) {
my $page = shift #pages;
$mech->get( $page );
push #pages, GetMorePages( $mech );
SomethingImportant( $mech );
SomethingXPATH( $mech );
}
}
The question is how to find the DOM-paths.
Use Firebug, Opera Dragonfly, Chromium Developer tools.
Call the context menu on the indicated element to copy an XPath expression or CSS selector (useful for Web::Query) to clipboard.
Really you want to use Web::Scraper for this kind of thing.

Why do all of these methods of accessing an array work?

It seems to me that some of these should fail, but they all output what they are supposed to:
$, = "\n";
%test = (
"one" => ["one_0", "one_1"],
"two" => ["two_0", "two_1"]
);
print #{$test{"one"}}[0],
#{$test{"one"}}->[0],
$test{"two"}->[0],
$test{"one"}[1];
Why is this?
Your first example is different than the others. It is an array slice -- but in disguise since you are only asking for one item.
#{$test{"one"}}[0];
#{$test{"one"}}[1, 0]; # Another slice example.
Your other examples are alternative ways of de-referencing items within a multi-level data structure. However, the use of an array for de-referencing is deprecated (see perldiag).
#{$test{"one"}}->[0]; # Deprecated. Turn on 'use warnings'.
$test{"two"}->[0]; # Valid.
$test{"one"}[1]; # Valid but easier to type.
Regarding the last example, two subscripts sitting next to each other have an implied -> between them. See, for example, the discussion of the Arrow Rule in perlreftut.