I am using WWW::Scripter to grab a page written with javascript/ajax, the "link" to the next page is a div tag, I can get the tag but cannot seem to figure out a way to click on it to get to the next page.. Any suggestions?
my $w = new WWW::Scripter;
$w->use_plugin('Ajax');
$w->get($c->website);
my $loop = 1;
my $page = 1;
while ($loop) {
my $te = HTML::TableExtract->new();
$content = $w->content();
$te->parse($content);
$table = $te->first_table_found;
$str .= Dumper $table;
$page += 1;
$loop = $self->next_page($w);
}
sub next_page {
my $self = shift;
my $w = shift;
$div = $w->document->getElementById('example_next');
if (defined $div) {
--I want to click on the div and move to the next page, suggestions?---
return 1;
} else {
return 0;
}
}
example html code... First there is a table holding the data...
<table class="display" id="example">
<thead>
headers
</thead>
<tbody>---DATA---</tbody>
</table>
Then pagination to go from "page" to "page" the data is rewritten with each pagination click..
<div class="dataTables_paginate paging_two_button" id="example_paginate">
<div class="paginate_disabled_previous" title="Previous" id="example_previous"></div>
<div class="paginate_enabled_next" title="Next" id="example_next"></div>
</div>
This is all using www.datatables.net
You need to identify the JavaScript call that occurs when that div's id is clicked, and then execute it. Alternatively you could use WWW::Mechanize::Firefox or WWW::Selenium.
Related
if($_POST) {
$mysql_pass = '';
$connect = #mysql_connect($_POST['dbHost'], $_POST['dbUser'], $mysql_pass);
if($connect) {
if(#mysql_select_db($_POST['dbName'])) {
$dir = $_SERVER['SERVER_NAME']. $_SERVER['REQUEST_URI'];
$redirectUrl = str_replace("configForm.php","site/index", $dir);
print $redirectUrl; exit;
$dbConf = dirname(__FILE__).'/dbConfig/dbconf.php';
$handle = fopen($dbConf, 'w') or die('Cannot open file: '.$dbConf);
$data = '<?php $userName="'.$_POST['dbUser'].'";'."\n";
$data .= '$passWord=" ";'."\n";
$data .= '$dbName="'.$_POST['dbName'].'";'."\n";
$data .= '$host="'.$_POST['dbHost'].'"; ?>';
fwrite($handle, $data);
header('Location: site/index');
}
else {
$error = 'Could not select database.';
}
}
else
$error = 'Not connected';
}
when the code above was executed, my webpage just displayed "localhost/www-edusec/site/index". i do not know where i do it wrong. my form action is like below:
<form action="<?php echo htmlspecialchars($_SERVER['PHP_SELF']); ?>" method="POST">
The action attribute defaults to the current page anyway, so that piece of code is surely redundant.
The problem is PHP_SELF returns the path to the file too. Just get rid of the action line completely and it should work fine.
EDIT:
Also, the reason your script is cutting off half way, you have an exit; after it prints out the directory.
Imagine an HTML page that is a report with repetitive structure:
<html>
<body>
<h1>Big Hairy Report Page</h1>
<div class="customer">
<div class="customer_id">001</div>
<div class="customer_name">Joe Blough</div>
<div class="customer_addr">123 That Road</div>
<div class="customer_city">Smallville</div>
<div class="customer_state">Nebraska</div>
<div class="order_info">
<div class="shipping_details">
<ul>
<li>Large crate</li>
<li>Fragile</li>
<li>Express</li>
</ul>
</div>
<div class="order_item">Deluxe Hoodie</div>
<div class="payment">35.95</div>
<div class="order_id">000123456789</div>
</div>
<div class="comment">StackOverflow rocks!</div>
</div>
<div class="customer">
<div class="customer_id">002</div>
.... and so forth for a list of 150 customers
This kind of report page appears often. My goal is to extract each customer's related information into some reasonable data structure using HTML::TreeBuilder::XPath.
I know to do the basics and get the file read into $tree. But how can one concisely loop through that tree and get associated clusters of information per each customer? How, for example, would I create a list of address labels sorted by customer number based on this information? What if I want to sort all my customer information by state?
I'm not asking for the whole perl (I can read my file, output to file, etc). I just need help understanding how to ask HTML::TreeBuilder::XPath for those bundles of related data, and then how to dereference them. If it's easier to express this in terms of an output statement (i.e., Joe Blough ordered 1 Deluxe Hoodie and left 1 comment) then that's cool, too.
Thank you very much for those of you who tackle this one, it seems a bit overwhelming to me.
This will do what you need.
It starts by pulling all the <div class="customer"> elements into array #customers and extracting the information from there.
I have taken your example of the address label, sorted by the customer number (by which I assume you mean the field with class="customer_id"). All of the address values are pulled from the array into the hash %customers, keyed by the customer ID and the name of the element class. The information is then printed in the order of the ID.
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
my $tree = HTML::TreeBuilder::XPath->new_from_file('html.html');
my #customers = $tree->findnodes('/html/body/div[#class="customer"');
my %customers;
for my $cust (#customers) {
my $id = $cust->findvalue('div[#class="customer_id"]');
for my $field (qw/ customer_name customer_addr customer_city customer_state /) {
my $xpath = "div[\#class='$field']";
my $val = $cust->findvalue($xpath);
$customers{$id}{$field} = $val;
}
}
for my $id (sort keys %customers) {
my $info = $customers{$id};
print "Customer ID $id\n";
print $info->{customer_name}, "\n";
print $info->{customer_addr}, "\n";
print $info->{customer_city}, "\n";
print $info->{customer_state}, "\n";
print "\n";
}
output
Customer ID 001
Joe Blough
123 That Road
Smallville
Nebraska
use HTML::TreeBuilder::XPath;
...
my #customers;
my $tree = HTML::TreeBuilder::XPath->new_from_content( $mech->content() );
foreach my $customer_section_node ( $tree->findnodes('//div[ #class = "customer" ]') ) {
my $customer = {};
$customer->{id} = find_customer_id($customer_section_node);
$customer->{name} = find_customer_name($customer_section_node);
...
push #customers, $customer;
}
$tree->delete();
sub find_customer_id {
my $node = shift;
my ($id) = $node->findvalues('.//div[ #class = "customer_id" ]');
return $id
}
I'll use XML::LibXML since it's faster and I'm familiar with it, but it should be pretty straightforward to convert what I post to from XML::LibXML to HTML::TreeBuilder::XPath if you so desire.
use XML::LibXML qw( );
sub get_text { defined($_[0]) ? $_[0]->textContent() : undef }
my $doc = XML::LibXML->load_html(...);
my #customers;
for my $cust_node ($doc->findnodes('/html/body/div[#class="customer"]')) {
my $id = get_text( $cust_node->findnodes('div[#class="customer_id"]') );
my $name = get_text( $cust_node->findnodes('div[#class="customer_name"]') );
...
push #customers, {
id => $id,
name => $name,
...
};
}
Actually, given the regularity of the data, you don't have to hardcode the field names.
use XML::LibXML qw( );
sub parse_list {
my ($node) = #_;
return [
map parse_field($_),
$node->findnodes('li')
];
}
sub parse_field {
my ($node) = #_;
my #children = $node->findnodes('*');
return $node->textContent() if !#children;
return parse_list($children[0]) if $children[0]->nodeName() eq 'ul';
return {
map { $_->getAttribute('class') => parse_field($_) }
#children
};
}
{
my $doc = XML::LibXML->load_html( ... );
my #customers =
map parse_field($_),
$doc->findnodes('/html/body/div[#class="customer"]');
...
}
I'm an old-newbie in Perl, and Im trying to create a subroutine in perl using HTML::TokeParser and URI.
I need to extract ALL valid links enclosed within on div called "zone-extract"
This is my code:
#More perl above here... use strict and other subs
use HTML::TokeParser;
use URI;
sub extract_links_from_response {
my $response = $_[0];
my $base = URI->new( $response->base )->canonical;
# "canonical" returns it in the one "official" tidy form
my $stream = HTML::TokeParser->new( $response->content_ref );
my $page_url = URI->new( $response->request->uri );
print "Extracting links from: $page_url\n";
my($tag, $link_url);
while ( my $div = $stream->get_tag('div') ) {
my $id = $div->get_attr('id');
next unless defined($id) and $id eq 'zone-extract';
while( $tag = $stream->get_tag('a') ) {
next unless defined($link_url = $tag->[1]{'href'});
next if $link_url =~ m/\s/; # If it's got whitespace, it's a bad URL.
next unless length $link_url; # sanity check!
$link_url = URI->new_abs($link_url, $base)->canonical;
next unless $link_url->scheme eq 'http'; # sanity
$link_url->fragment(undef); # chop off any "#foo" part
print $link_url unless $link_url->eq($page_url); # Don't note links to itself!
}
}
return;
}
As you can see, I have 2 loops, first using get_tag 'div' and then look for id = 'zone-extract'. The second loop looks inside this div and retrieve all links (or that was my intention)...
The inner loop works, it extracts all links correctly working standalone, but I think there is some issues inside the first loop, looking for my desired div 'zone-extract'... Im using this post as a reference: How can I find the contents of a div using Perl's HTML modules, if I know a tag inside of it?
But all I have by the moment is this error:
Can't call method "get_attr" on unblessed reference
Some ideas? Help!
My HTML (Note URL_TO_EXTRACT_1 & 2):
<more html above here>
<div class="span-48 last">
<div class="span-37">
<div id="zone-extract" class="...">
<h2 class="genres"><img alt="extracting" class="png"></h2>
<li><a title="Extr 2" href="**URL_TO_EXTRACT_1**">2</a></li>
<li><a title="Con 1" class="sel" href="**URL_TO_EXTRACT_2**">1</a></li>
<li class="first">Pàg</li>
</div>
</div>
</div>
<more stuff from here>
I find that TokeParser is a very crude tool requiring too much code, its fault is that only supports the procedural style of programming.
A better alternatives which require less code due to declarative programming is Web::Query:
use Web::Query 'wq';
my $results = wq($response)->find('div#zone-extract a')->map(sub {
my (undef, $elem_a) = #_;
my $link_url = $elem_a->attr('href');
return unless $link_url && $link_url !~ m/\s/ && …
# Further checks like in the question go here.
return [$link_url => $elem_a->text];
});
Code is untested because there is no example HTML in the question.
So I have a reporting tool that spits out job scheduling statistics in an HTML file, and I'm looking to consume this data using Perl. I don't know how to step through a HTML table though.
I know how to do this with jQuery using
$.find('<tr>').each(function(){
variable = $(this).find('<td>').text
});
But I don't know how to do this same logic with Perl. What should I do? Below is a sample of the HTML output. Each table row includes the three same stats: object name, status, and return code.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<HTML>
<HEAD>
<meta name="GENERATOR" content="UC4 Reporting Tool V8.00A">
<Title></Title>
<style type="text/css">
th,td {
font-family: arial;
font-size: 0.8em;
}
th {
background: rgb(77,148,255);
color: white;
}
td {
border: 1px solid rgb(208,213,217);
}
table {
border: 1px solid grey;
background: white;
}
body {
background: rgb(208,213,217);
}
</style>
</HEAD>
<BODY>
<table>
<tr>
<th>Object name</th>
<th>Status</th>
<th>Return code</th>
</tr>
<tr>
<td>JOBS.UNIX.S_SITEVIEW.WF_M_SITEVIEW_CHK_FACILITIES_REGISTRY</td>
<td>ENDED_OK - ended normally</td>
<td>0</td>
</tr>
<tr>
<td>JOBS.UNIX.ADMIN.INFA_CHK_REP_SERVICE</td>
<td>ENDED_OK - ended normally</td>
<td>0</td>
</tr>
<tr>
<td>JOBS.UNIX.S_SITEVIEW.WF_M_SITEVIEW_CHK_FACILITIES_REGISTRY</td>
<td>ENDED_OK - ended normally</td>
<td>0</td>
</tr>
The HTML::Query module is a wrapper around the HTML parser that provides a querying interface that is familiar to jQuery users. So you could write something like
use HTML::Query qw(Query);
my $docName = "test.html";
my $doc = Query(file => $docName);
for my $tr ($doc->query("td")) {
for my $td (Query($tr)->query("td")) {
# $td is now an HTML::Element object for the td element
print $td->as_text, "\n";
}
}
Read the HTML::Query documentation to get a better idea of how to use it--- the above is hardly the prettiest example.
You could use a RegExp but Perl already has modules built for this specific task. Check out HTML::TableContentParser
You would probably do something like this:
use HTML::TableContentParser;
$tcp = HTML::TableContentParser->new;
$tables = $tcp->parse($HTML);
foreach $table (#$tables) {
foreach $row (#{ $tables->{rows} }) {
foreach $col (#{ $row->{cols} }) {
# each <td>
$data = $col->{data};
}
}
}
Here I use the HTML::Parser, is a little verbose, but guaranteed to work. I am using the diamond operator so, you can use it as a filter. If you call this Perl source extractTd, here are a couple of ways to call it.
$ extractTd test.html
or
$ extractTd < test.html
will both work, output will go on standard output and you can redirect it to a file.
#!/usr/bin/perl -w
use strict;
package ExtractTd;
use 5.010;
use base "HTML::Parser";
my $td_flag = 0;
sub start {
my ($self, $tag, $attr, $attrseq, $origtext) = #_;
if ($tag =~ /^td$/i) {
$td_flag = 1;
}
}
sub end {
my ($self, $tag, $origtext) = #_;
if ($tag =~ /^td$/i) {
$td_flag = 0;
}
}
sub text {
my ($self, $text) = #_;
if ($td_flag) {
say $text;
}
}
my $extractTd = new ExtractTd;
while (<>) {
$extractTd->parse($_);
}
$extractTd->eof;
Have you tried looking at cpan for HTML libraries? This seems to do what your wanting
http://search.cpan.org/~msisk/HTML-TableExtract-2.11/lib/HTML/TableExtract.pm
Also here is a whole page of different HTML related libraries to use
http://search.cpan.org/search?m=all&q=html+&s=1&n=100
Perl CPAN module HTML::TreeBuilder.
I use it extensively to parse a lot of HTML documents.
The concept is that you get an HTML::Element (the root node by example).
From it, you can look for other nodes:
Get a list of children nodes with ->content_list()
Get the parent node with ->parent()
Disclaimer: The following code has not been tested, but it's the idea.
my $root = HTML::TreeBuilder->new;
$root->utf8_mode(1);
$root->parse($content);
$root->eof();
# This gets you an HTML::Element, of the root document
$root->elementify();
my #td = $root->look_down("_tag", "td");
foreach my $td_elem (#td)
{
printf "-> %s\n", $td_elem->as_trimmed_text();
}
If your table is more complex than that, you could first find the TABLE element,
then iterate over each TR children, and for each TR children, iterate over TD elements...
http://metacpan.org/pod/HTML::TreeBuilder
So I am using a Zend_Form_Element_MultiCheckbox to display a long list of checkboxes. If I simply echo the element, I get lots of checkboxes separated by <br /> tags. I would like to figure out a way to utilize the simplicity of the Zend_Form_Element_MultiCheckbox but also display as multiple columns (i.e. 10 checkboxes in a <div style="float:left">). I can do it manually if I had an array of single checkbox elements, but it isn't the cleanest solution:
<?php
if (count($checkboxes) > 5) {
$columns = array_chunk($checkboxes, count($checkboxes) / 2); //two columns
} else {
$columns = array($checkboxes);
}
?>
<div id="checkboxes">
<?php foreach ($columns as $columnOfCheckboxes): ?>
<div style="float:left;">
<?php foreach($columnOfCheckboxes as $checkbox): ?>
<?php echo $checkbox ?> <?php echo $checkbox->getLabel() ?><br />
<?php endforeach; ?>
</div>
<?php endforeach; ?>
</div>
How can I do this same sort of thing and still use the Zend_Form_Element_MultiCheckbox?
The best place to do this is using a view helper. Here is something I thought of really quickly that you could do. You can use this in your view scripts are attach it to a Zend_Form_Element.
I am going to assume you know how to use custom view helpers and how to add them to form elements.
class My_View_Helper_FormMultiCheckbox extends Zend_View_Helper_FormMultiCheckbox
{
public function formMultiCheckbox($name, $value = null, $attribs = null,
$options = null, $listsep = "<br />\n")
{
// zend_form_element attrib has higher precedence
if (isset($attribs['listsep'])) {
$listsep = $attribs['listsep'];
}
// Store original separator for later if changed
$origSep = $listsep;
// Don't allow whitespace as a seperator
$listsep = trim($listsep);
// Force a separator if empty
if (empty($listsep)) {
$listsep = $attribs['listsep'] = "<br />\n";
}
$string = $this->formRadio($name, $value, $attribs, $options, $listsep);
$checkboxes = explode($listsep, $string);
$html = '';
// Your code
if (count($checkboxes) > 5) {
$columns = array_chunk($checkboxes, count($checkboxes) / 2); //two columns
} else {
$columns = array($checkboxes);
}
foreach ($columns as $columnOfCheckboxes) {
$html .= '<div style="float:left;">';
$html .= implode($origSep, $columnOfCheckboxes);
$html .= '</div>';
}
return $html;
}
}
If you need further explanation just let me know. I did this fairly quickly.
EDIT
The reason I named it the same and placed in a different directory was only to override Zend's view helper. By naming it the same and adding my helper path:
$view->addHelperPath('My/View/Helper', 'My_View_Helper');
My custom view helper gets precedence over Zend's helper. Doing this allowed me to test without changing any of my forms,elements, or views that used Zend's helper. Basically, that's how you replace one of Zend's view helpers with one of your own.
Only reason I mentioned the note on adding custom view helpers and adding to form elements was because I assumed you might rename the helper to better suit your needs.