Accessing an href value using HTML::TreeBuilder::XPath - perl

I am using the LWP::UserAgent,
HTML::Selector::XPath and
HTML::TreeBuilder::XPath modules to get the value of the href attribute of the first YouTube video in a set of search results.
My code so far is:
use LWP::UserAgent;
use HTML::TreeBuilder::XPath;
use HTML::Selector::XPath;
my $ua = LWP::UserAgent->new;
#my $response =..
my $html = "http://www.youtube.com/results?search_query=run+flo+rida";
my $tree = HTML::TreeBuilder::XPath->new;
my $xpath = HTML::Selector::XPath::selector_to_xpath("(//*[#id = 'search-results']/li)[1]/div[2]/h3/a/#href/");
my #nodes = $tree->findnodes($xpath);
print" $nodes[0]";
I'm not sure if my printing is incorrect of if some other syntax is wrong. As of now it prints
HTML::TreeBuilder::XPath=HASH(0x1a78250)
when I am looking for it to print
/watch?v=JP68g3SYObU
Thanks for any help!

There are a number of problems here.
You must always use strict and use warnings at the top of every Perl program. It will catch many errors that you would easily overlook, and is only polite when you are asking for help with your code. In this case it would have warned you that your XPath string contained array variable names #id and #href which you may not have intended to be interpolated into the string.
You are using HTML::Selector::XPath, which translates a CSS selector to an XPath expression. But you are supplying it an XPath expression, so it will not work and the module is not needed.
There is no need to use LWP at all, as HTML::TreeBuilder has a new_from_url constructor which will fetch the HTML page for you.
This program seems to do what you need. I have also added the URI module to derive an absolute URL from the relative one in the href attribute value.
use strict;
use warnings;
use HTML::TreeBuilder::XPath;
use URI;
my $url = "http://www.youtube.com/results?search_query=run+flo+rida";
my $tree = HTML::TreeBuilder::XPath->new_from_url($url);
my $anchor = $tree->findnodes('//ol[#id="search-results"]//h3[#class="yt-lockup2-title"]/a/#href');
my $href = URI->new_abs($anchor->[0]->getValue, $url);
print $href;
output
http://www.youtube.com/watch?v=JP68g3SYObU

Related

How to fetch a one table from HTML source file using lwp module?

I'm beginner. I want to know how to fetch one table form the source HTML file using LWP module? Is it possible to use Regex with LWP?
You can use LWP to get the HTML source of a web page. Most easily, by using the get() function from LWP::Simple.
my $html = get('http://example.com/');
Now, in $html you have a text string (potentially a very long text string) which contains HTML. You can use any techniques you want to extract data from that string.
(Hint: Using a regex to do this is likely to be a very bad idea. It will be far harder than you expect and probably very fragile. Perhaps use a better tool - like HTML::TableExtract instead.)
use Web::Query::LibXML 'wq';
wq('https://www.december.com/html/demo/table.html')
->find('table th')
->each(sub {
my (undef, $e) = #_;
print $e->text . "\n";
});
__END__
Outer Table
Inner Table
CORNER
Head1
Head2
Head3
Head4
Head5
Head6
Little

Test::MockObject::Extends with 'fields' gives error

So trying to upgrade some old test modules written by other people to support newer perls. Some of the tests are using Test::MockObject::Extends, but I've found running the following code errors out.
#!/usr/bin/env perl
package MyModule;
use strict;
use warnings;
use fields qw(field1 field2);
sub new {
my $self = shift;
unless (ref $self) {
$self = fields::new($self);
}
return $self;
}
package main;
use strict;
use warnings;
use Test::MockObject::Extends;
use Data::Dumper;
my $VAR1 = MyModule->new();
print Data::Dumper::Dumper($VAR1);
my $VAR2 = Test::MockObject::Extends->new($VAR1);
Error:
$ perl a
$VAR1 = bless( {}, 'MyModule' );
Modification of a read-only value attempted at /usr/local/share/perl/5.14.2/Test/MockObject/Extends.pm line 31.
I've looked at the changelog for Test::MockObject and perl 5.10 and can't see anything that directly looks like it causes this. I suspect its been broken for a while and something new for 5.10 just illuminated it.
I think what's happening here is a result of using fields::new. From the perldoc page:
perl 5.9.0 and higher: fields::new() creates and blesses a
restricted-hash comprised of the fields declared using the "fields"
pragma into the specified class.
I think Test::MockObject::Extends wants to modify the hash, hence boom.
#nfg's answer is correct, but there is a simple workaround: unlock the hash created by fields before passing it to Test::Object::Extends.
use Hash::Util qw(unlock_keys);
my $obj = Some::Class->new();
unlock_keys(%$obj);
$obj = Test::MockObject::Extends->new($obj);
This will fail if using a perl < 5.9, so if that is a concern then you could unlock the keys conditionally.
You may want to re-lock the keys after calling Test::MockObject::Extends and mocking any methods you want to intercept, because otherwise errant code that is accessing fields that should not exist in the object will not be caught.

How to navigate Web Pages in Perl using WWW::Mechanize and dropdowns?

I am having a problem, what i am trying to do is:
I open a webpage with WWW::Mechanize, fill the username and password and log in.
The issue I am having is, after logging in I have to select the value from a dropdown list
and after that I have to press submit.
How can i do that?
The code which i have used is:
#!/usr/bin/perl
use LWP::UserAgent;
use WWW::Mechanize;
use HTTP::Cookies;
use strict;
my $username="123456";
my $password="XXXXX";
my $project="Systems";
my $agent = WWW::Mechanize->new();
$agent->get('http://www.XXXXX.com');
$agent->form_name("login_form");
$agent->field("txtLoginId", $username);
$agent->field("txtPassword", $password);
$agent->submit();
#Till now it has success full logined, From here it has to select one value from a drop #down box
$agent->form_name("frmProject");
$agent->field("cboProject", $project);
my $response=$agent->submit();
if ($response->is_success){
print "\nContent:\n";
print $response->as_string;
}elsif ($response->is_error){
print $response->error_as_HTML;
}
WWW::Mechanize has a click method that you can use to click on a button. Or, study the submit_form method for which you can specify values of all the form elements. If the page uses javascript, WWW:Mechanize might be not suitable for your task (see for example WWW::Mechanize::Firefox for an alternative).
You need to use the select method. If you know the value (not the display-text), use this:
$agent->form_name("frmProject");
$agent->select('cboProject', $project);
my $response = $agent->submit();
If you don't take a look at the Mechanize FAQ. It says you have to do something like this:
# Find the correct input element
my ($projectlist) = $agent->find_all_inputs( name => 'cboProject' );
# Look up the value of the option
my %name_lookup;
#name_lookup{ $projectlist->value_names } = $projectlist->possible_values;
# use the display-text to get the correct value
my $value = $name_lookup{ $project };
Once you've done that, you can use the click-method to submit the page.
$agent->click_button('name_of_the_submit_button');
But if the button you have to click is the default action, $agent->submit() should do the trick as well.

Perl DBI Query -> JSON -> JQuery AutoComplete

I've been trying to read up on how to implement a JSON solution in order to use JQueryUI's autocomplete functionality. I am trying to use autocomplete to search a database on for a name and after selection populate the ID to a hidden object. I've seen alot of examples around the web, but haven't found the best way to implement this. The database doesn't change that often, so I'm not sure how to best approach this performance wise.
Backend:
#!/usr/bin/perl
use CGI;
use DBI;
use strict;
use warnings;
my $cgi = CGI->new;
my $dbh = DBI->connect('dbi:mysql:hostname=localhost;database=test',"test","test") or die $DBI::errstr;
my $sth = $dbh->prepare(qq{select id, name from test;}) or die
$dbh->errstr;
$sth->execute() or die $sth->errstr;
my $json = undef;
while(my #user = $sth->fetchrow_array()) {
$json .= qq{{"$user[0]" : "$user[1]"}};
}
print $cgi->header(-type => "application/json", -charset => "utf-8");
print $json;
The jQuery autocomplete needs a "value" or "label" field returned with the json result. If you do not include it, the jquery autocomplete will not work:
The basic functionality of the autocomplete works with the results of the query assigned to the ‘label’ and ‘value’ fields. Explanation on the ‘label’ and ‘value’ fields from the jQuery UI site:
“The local data can be a simple Array of Strings, or it contains Objects for each item in the array, with either a label or value property or both. The label property is displayed in the suggestion menu. The value will be inserted into the input element after the user selected something from the menu. If just one property is specified, it will be used for both, eg. if you provide only value-properties, the value will also be used as the label.”
Link to full example: http://www.jensbits.com/2011/05/09/jquery-ui-autocomplete-widget-with-perl-and-mysql/
You need to grap the JSON package from CPAN instead of doing this:
my $json = undef;
while(my #user = $sth->fetchrow_array()) {
$json .= qq{{"$user[0]" : "$user[1]"}};
}
For example, with JSON it'd look like this:
use JSON;
my $json = {};
while(my #user = $sth->fetchrow_array()) {
$json->{$user[0]} = $user[1];
}
print JSON::to_json($json);
The JSON package will automatically construct a valid JSON string from any Perl data structure you provide it. We use it all over the place on Melody and it's proved to be a real life saver for sanely converting a structure into valid JSON.
Here I'm talking about performance.
There is some trigger you can set to improve performance, client side you can set the minimum number of characters required before the request is sent.
You can also set the "timeout" between two characters typing before the request is sent.
If your database table is really huge, I suggest you put a LIMIT on results you retrieve.
First to avoid long request processing, but also because some clients like IE6 arent't really fast handling more than a hundred results (Not to say, it's also not really user friendly).
On a project using IE6, we limited the elements returned to 100. If the user can't reduce the search to 100 elements, we presume he/she doesn't know what he/she is looking for.
Hope it helps a bit.

How do I create an absolute URL from two components, in Perl?

Suppose I have:
my $a = "http://site.com";
my $part = "index.html";
my $full = join($a,$part);
print $full;
>> http://site.com/index.html
What do I have to use as join, in order to get my snippet to work?
EDIT: I'm looking for something more general. What if a ends with a slash, and part starts with one? I'm sure in some module, someone has this covered.
I believe what you're looking for is URI::Split, e.g.:
use URI::Split qw(uri_join);
$uri = uri_join('http', 'site.com', 'index.html')
use URI;
URI->new("index.html")->abs("http://site.com")
will produce
"http://site.com/index.html"
URI->abs will take care of merging the paths properly following your uri specification,
so
URI->new("/bah")->abs("http://site.com/bar")
will produce
"http://site.com/bah"
and
URI->new("index.html")->abs("http://site.com/barf")
will produce
"http://site.com/barf/index.html"
and
URI->new("../uplevel/foo")->abs("http://site.com/foo/bar/barf")
will produce
"http://site.com/foo/uplevel/foo"
alternatively, there's a shortcut sub in URI namespace that I just noticed:
URI->new_abs($url, $base_url)
so
URI->new_abs("index.html", "http://site.com")
will produce
"http://site.com/index.html"
and so on.
No need for ‘join‘, just use string interpolation.
my $a = "http://site.com";
my $part = "index.html";
my $full = "$a/$part";
print $full;
>> http://site.com/index.html
Update:
Not everything requires a module. CPAN is wonderful, but restraint is needed.
The simple approach above works very well if you have clean inputs. If you need to handle unevenly formatted strings, you will need to normalize them somehow. Using a library in the URI namespace that meets your needs is probably the best course of action if you need to handle user input. If the variance is minor File::Spec or a little manual clean-up may be good enough for your needs.
my $a = 'http://site.com';
my #paths = qw( /foo/bar foo //foo/bar );
# bad paths don't work:
print join "\n", "Bad URIs:", map "$a/$_", #paths;
my #cleaned = map s:^/+::, #paths;
print join "\n", "Cleaned URIs:", map "$a/$_", #paths;
When you have to handle bad stuff like $path = /./foo/.././foo/../foo/bar; is when you want definitely want to use a library. Of course, this could be sorted out using File::Spec's cannonical path function.
If you are worried about bad/bizarre stuff in the URI rather than just path issues (usernames, passwords, bizarre protocol specifiers) or URL encoding of strings, then using a URI library is really important, and is indisputably not overkill.
You might want to take a look at this, an implementation of a function similar to Python's urljoin, but in Perl:
http://sveinbjorn.org/urljoin_function_implemented_using_Perl
As I am used to Java java.net.URL methods, I was looking for a similar way to concatenate URI without any assumption about scheme, host or port (in my case, it is for possibly complex Subversion URL):
http://site.com/page/index.html
+ images/background.jpg
=> http://site.com/page/images/background.jpg
Here is the way to do it in Perl:
use URI;
my $base = URI->new("http://site.com/page/index.html");
my $result = URI->new_abs("images/background.jpg", $base);