Perl HTML::TreeBuilder Class "Contains" Condition - perl

I'm trying to use Perls HTML::TreeBuilder to extract data from an HTML page. My selectors include the following:
$root->look_down(_tag => 'div', class => 'member-search-results');
However, the div I'm looking for has multiple classes, one of which is member-search-results. With this code, I'm unable to find the div, and need to list all of the classes to get a successful match.
Is there any way I can do a class contains search on the elements, so that the code can also match tags like:
<div class="CLASS1 member-search-results CLASS2">...</div>
I understand that this should work:
$root->look_down(_tag => 'div', class => qr/member-search-results/);
But is this the correct way of doing this or is there a better method?
Thanks

Use Web::Query instead. Its CSS selectors are working according to the standards.
use Web::Query qw();
Web::Query
->new_from_html('<div class="CLASS1 member-search-results CLASS2">...</div>')
->find('div.member-search-results')
->text; # returns '...'

As Philip pointed out, using the regex method gets the desired results. Specifically, here is what I used:
$tag = $tag->look_down(_tag => 'ol', class => qr/members/);

Perhaps you need to separate _tag and class into separate look_down's (chain them together).
I use:
$tree->look_down( id => 'mw-content-text' )->look_down( _tag => 'ul' );
at https://github.com/pdurbin/scripts/blob/master/inthenews

Related

How to use Class:DBI with own constructors or OO-systems like Moo(se)?

When using Class::DBI in Perl, the insert() method from Class::DBI acts as a constructor returning an object. How can I use Class::DBI in combination with object attributes that are not part of any database tables?
For example: I would like to have my Music::Artist class to have a version attribute that is part of the resulting objects (so that I can use this object attribute in my overall application logic), but does not get written to the database?
Ultimately I would like to be able to combine the usage of Class::DBI with OO-systems like Moo(se).
Vanilla Class:DBI example code from metacpan:
package Music::DBI;
use base 'Class::DBI';
Music::DBI->connection('dbi:mysql:dbname', 'username', 'password');
package Music::Artist;
use base 'Music::DBI';
Music::Artist->table('artist');
Music::Artist->columns(All => qw/artistid name/);
#-- Meanwhile, in a nearby piece of code! --#
my $artist = Music::Artist->insert({ artistid => 1, name => 'U2' });
Pseude-code of what I would like to do:
package Music::Artist;
use base 'Music::DBI';
use Moo;
Music::DBI->connection('dbi:mysql:dbname', 'username', 'password');
Music::Artist->table('artist');
Music::Artist->columns(All => qw/artistid name/);
has name => ( is => 'rw' );
has version => ( is => 'rw' );
#-- Meanwhile, in a nearby piece of code! --#
my $artist = Music::Artist->new( name => 'U2', version => '0.1.0' );
$artist = Music::Artist->insert({ artistid => 1, name => $artist->name });
# ... do something with $artist->version ...
(Although this code could run, Class::DBI's insert() of cause overrides the object returned by Moo's new() in the first place.)
How to combine Class::DBI with own or third-party (Moo) constructors?
I read Class::DBIs documentation but did not find any information on how to override insert() as an approach to supply a combined constructor method. I also tried to find repositories on GitHub that make use of Class::DBI and own constructors or OO-systems in the same packages, but did not succeed either.

Perl5 - Moose - Attribute accessors' names

I'd like to ask you for an advice regarding attribute accessors' naming.
I started to develop a project that is supposed to have quite a ramified hierarchy of classes, for example, the SomeFramework class, a bunch of classes like SomeFramework::Logger and, let's say, classes similar to SomeFramework::SomeSubsystem::SomeComponent::SomeAPI classes.
My goal is to design the most efficient communication between all these classes. I'll explain how I'm doing it now, so maybe you would like to share some opinions on how to make it better.
When I initialize the SomeFramework class, I have an object reference which I use from my application.
my $someframework = SomeFramework->new(parameter => 'value');
The SomeFramework class has some attributes, such as logger, configuration, etc, here are some examples of their definitions:
has 'logger' => (
is => 'ro',
isa => 'SomeFramework::Logger',
reader => 'get_logger',
writer => '_set_logger',
builder => '_build_logger',
lazy => 1
);
sub _build_logger {
my $self = shift;
SomeFramework::Logger->new(someframework => $self);
}
I'm passing the reference to the parent object to the child object, because I need the child to have access to the parent and its methods & accessors. So in the SomeFramework::Logger I have such attribute:
has 'someframework' => (
is => 'ro',
isa => 'SomeFramework',
reader => 'get_someframework',
writer => '_set_someframework',
required => 1
);
It lets me to have access to any object from within the SomeFramework::Logger class, usually it looks something like that:
my $configuration =
$self->
get_someframework->
get_configuration->
get_blah_blah;
To extrapolate it, let's look into the SomeFramework::SomeSubsystem::SomeComponent::SomeAPI class. This class has its own "parent" attribute (let's call it somecomponent) which is supposed to have a reference to a SomeFramework::SomeSubsystem::SomeComponent object as the value. The SomeFramework::SomeSubsystem::SomeComponent class has the attribute for its own parent attribute (we can call it somesubsystem) which is supposed to contain a reference to a SomeFramework::SomeSubsystem object. And, finally, this class has the attribute for its own parent too (someframework), so it contains the reference to a SomeFramework object.
It all makes it possible to have something like that inside of the SomeFramework::SomeSubsystem::SomeComponent::SomeAPI class:
my $configuration =
$self->
get_someframework->
get_somesubsystem->
get_somecomponent->
get_configuration->
get_blah_blah;
The first thing I'd like to know: is it a good practice? I hope, it is, but maybe you would advice me to go some more smooth way?
The second question is a bit more complicated (as for me), but I hope you'll help me with it. :) I like canonical names of accessors recommended by D.Conway in his "Perl Best Practices", but I'd like to do something like that:
my $configuration = $self->sc->ss->sf->conf->blah_blah;
Surely I can name all readers in this laconical manner:
has 'some_framework' => (
is => 'ro',
isa => 'SomeFramework',
reader => 'sf',
writer => '_set_someframework',
required => 1
);
But I don't like the idea of managing without the "standard" accessors names. :(
Also I can use MooseX::Aliases, it works fine for something like that:
has 'some_framework' => (
is => 'ro',
isa => 'SomeFramework',
reader => 'get_someframework',
writer => '_set_someframework',
required => 1,
alias => 'sf'
);
It looks fine, but there's an issue with attributes which names do NOT needed to be shortened. For example:
has 'api' => (
is => 'ro',
isa => 'SomeFramework::SomeSubsystem::SomeComponent::API',
reader => '_get_api',
writer => '_set_api',
required => 1,
alias => 'api'
);
In this case Moose throws an exception: Conflicting init_args: (api, api) at constructor. :( As I understand, MooseX::Aliases tries to create an attribute with the same value of the init_args parameter, so it fails. By the way, sometimes it happens, but sometimes it works fine, I haven't discovered when exactly it doesn't work.
Maybe I should have something like that:
has 'api' => (
is => 'ro',
isa => 'SomeFramework::SomeSubsystem::SomeComponent::API',
reader => '_get_api',
writer => '_set_api',
required => 1,
handles => {
api => 'return_self' # It's supposed to have some method that only
# returns the reference to its own object
}
);
? But it doesn't seem to be the best option too, because it helps me only if the attribute contains a reference some object for which I can define the return_self method. If the attribute contains a reference to some "foreign" object or some other value (e.g., a hash), it won't be possible to call that method. :(
Ugh... Sorry for such a long rant! I hope, you have managed to read to here. :)
I'll be very happy to get to know what do you thing and what would you suggest to do. Feel free to share any your ideas on this topic, any fresh ideas will be very appreciated!
Updated on 25.10.2015
As for the bigger question, let me see if I understood. There are an Apple and a Banana. The Fridge has both of them inside. But you want the Apple to know about the Fridge, and the Worm should know about the Apple, so that it can go from Worm up to Apple up to Fridge and turn the $fridge->light off when it wants to sleep. Is that correct? Sounds like a horrible idea that breaks all kinds of design patterns
Well, to be frank, I didn't think it's horrible. As for me, it's quite good when it's possible to have access from some class to some other class within the same framework. Why not? For example, let's imagine we have some class for the jobs-queue runner (let's call it SomeFramework::JobsQueue::Executor) and some class for jobs. Is it really bad to do something like:
package SomeFramework::JobsQueue::Executor;
use Moose;
use MooseX::Params::Validate;
has queue {
isa => 'SomeFramework::JobsQueue',
required => 1,
reader => 'get_queue',
writer => '_set_queue'
}
# This attribute is being set by the framework when the framework
# creates the SomeFramework::JobsQueue::Executor-based object
sub execute {
my($self, $job, $options) = validated_hash(
\#_,
job => { isa => 'SomeFramework::JobsQueue::Job' },
options => { isa => 'HashRef' }
);
my $queue = $self->get_queue;
$queue->mark_as_running($job->get_id);
$job->execute(options => $options);
$queue->mark_as_completed($job->get_id);
}
? So, our queue-runner object is aware about the queue object it "belongs" to, so it can call some methods of this queue object.
Or let's look at much more simple example:
package SomeFramework::SomeSubsystem;
use Moose;
has 'some_framework' => {
isa => 'SomeFramework',
required => 1,
reader => 'get_some_framework',
writer => '_set_some_framework'
}
sub some_method {
my $self = shift;
$self->get_some_framework->get_logger->log_trace("Hello, world!");
}
So, our object knows how to call methods of the framework's object that has initialized that object, moreover it can call some methods of the framework's object and even some methods of other objects initialized and stored by the framework's object.
If it's really bad, would you be so kind as to help me to understand why? Thank you!

In Perl, what's the meaning of this code " has 'absolute_E' => (is => 'rw',default => sub {0} );"

The following codes make me so confused, I can't find any related knlowledge about the syntax "has ,is ,default, lazy". Can anybody make a detailed explain for me, best wishes.
has 'absolute_E' => (is => 'rw', default => sub {0} );
has 'retract_speed_mm_min' => (is => 'lazy');
has 'retract_speed_mm_min' => (is => 'lazy');
Judging by this line, this is probably a Moo class. To confirm this, have a look near the top of the file, and you should see something like use Moo.
Moo is an object-oriented framework for Perl. I'll assume you understand OO concepts.
Some historical background: Perl 5 has built-in OO capabilities, however it can get a little cumbersome at times. Then Moose came around as an improved way of OOP in Perl. But Moose was also quite heavy, with a compile-time cost, so Moo (and also Mouse just before it) came after that as something of a lighter-weight subset of Moose.
has is for defining attributes in your class.
has 'absolute_E' => ( is => 'rw', default => sub {0} );
This defines an attribute named absolute_E.
is => 'rw' means it is readable and writable, which means you can do this:
my $value = $obj->absolute_E; # gets the value
$obj->absolute_E($value); # sets the value
When you instantiate the object, you can supply a value for the attribute:
my $obj = My::Class->new( absolute_E => 5 );
But if you don't supply anything then absolute_E is set to 0 by default.
This second attribute has a few more things:
has 'retract_speed_mm_min' => (is => 'lazy');
This is short form for:
has 'retract_speed_mm_min' => (
is => 'ro',
lazy => 1,
builder => '_build_retract_speed_mm_min'
);
This attribute is readonly which means you can't change its value after construction. But you can supply a value at construction as before.
The builder is another way of providing a default value. It requires the class to have a separate method named _build_retract_speed_mm_min that should return the default value.
lazy works with builder. It means that the attribute should not be set by the builder until it the attribute is used. The delay may be used because the builder depends on other attributes in order to build this attribute's value.
There's a lot more in Moo and Moose. I would suggest reading http://modernperlbooks.com/books/modern_perl_2014/07-object-oriented-perl.html and https://metacpan.org/pod/Moose::Manual and https://metacpan.org/pod/Moo.
That code basically equals
has ('absolute_E', 'is', 'rw', 'default', sub {0} );
has ('retract_speed_mm_min', 'is', 'lazy');
And has looks like a user-defined subroutine.
=> is almost the same as ,:
The => operator is a synonym for the comma except that it causes a word on its left to be interpreted as a string if it begins with a letter or underscore and is composed only of letters, digits and underscores.

TYPO3: Use t3lib_TCEforms in frontend plugin

I would like to use as much standard TYPO3 as possible to create a form to edit single records from tx_mytable.
In pi1 i load the tca for the table:
t3lib_div::loadTCA('tx_mytable');
Now I would like to use standard functions to create my form elements more or less like it is done in the backend...
I found this for the front end but cannot find any working examples:
t3lib_TCEforms_fe.php (that extends the normal t3lib_TCEforms)
Is this the right way to go or is there a better way?
I got something working but not really that nice code in the frontend
Here is a link that telss that TCA is not enough but two new entries in the array is needed
http://www.martin-helmich.de/?p=15
It is itemFormElName and itemFormElValue
// include tceforms_fe (place outside class where pipase is included)
require_once(PATH_t3lib.'class.t3lib_tceforms_fe.php');
// load TCA for table in frontend
t3lib_div::loadTCA('tx_ogcrmdb_tasks');
// init tceforms
$this->tceforms = t3lib_div::makeInstance("t3lib_TCEforms_FE");
$this->tceforms->initDefaultBEMode(); // is needed ??
$this->tceforms->backPath = $GLOBALS['BACK_PATH']; // is empty... may not be needed
//////////REPEAT FOR EACH INPUT FIELD/////////
// start create input fields, here just a single select for responsible
// conf used for tceforms similar to but not exactly like normal TCA
$conftest = array(
'itemFormElName' => $GLOBALS['TCA']['tx_ogcrmdb_tasks']['columns']['responsible']['label'],
'itemFormElValue' => 1,
'fieldConf' => array(
'config' => $GLOBALS['TCA']['tx_ogcrmdb_tasks']['columns']['responsible']['config']
)
);
// create input field
$this->content .= $this->tceforms->getSingleField_SW('','',array(),$conftest);
// wrap in form
$output = '<form action="" name="editform" method="post">';
$output .= $this->content;
$output .= '</form>';
// wrap and return output
return $output;
Still looking for a working example with custem template for input fields.

Which recommended Perl modules can serialize Moose objects?

I was usually using Storable with nstore, but now I have a module that has CODE and apparently Storable doesn't like that.
I found YAML (and YAML::XS which I can't really get to work).
I also experimented a bit with MooseX::Storage without much success.
Are there other alternatives?
What would you recommend?
You can dump a coderef with Data::Dumper after setting $Data::Dumper::Deparse to a true value, but this is only intended for debugging purposes, not for serialization.
I would suggest you go back to looking at why MooseX::Storage isn't working out for you, as the authors tried really hard to present a well-abstracted and robust solution for Moose object serialization.
Update: it looks like you are running into issues serializing the _offset_sub attribute, as described in this question. Since that attribute has a builder, and its construction is fairly trivial (it just looks at the current value of another attribute), you shouldn't need to serialize it at all -- when you deserialize your object and want to use it again, the builder will be invoked the first time you call $this->offset. Consequently, you should just be able to mark it as "do not serialize":
use MooseX::Storage;
has '_offset_sub' => (
is => 'ro',
isa => 'CodeRef',
traits => [ 'DoNotSerialize' ],
lazy => 1,
builder => '_build_offset_sub',
init_arg => undef,
);
Lastly, this is somewhat orthogonal, but you can fold the offset and
_offset_sub attributes together by using the native attribute 'Code' trait:
has offset => (
is => 'bare',
isa => 'CodeRef',
traits => [ qw(Code DoNotSerialize) ],
lazy => 1,
builder => '_build_offset',
init_arg => undef,
handles => {
offset => 'execute_method',
},
);
sub _build_offset {
my ($self) = #_;
# same as previous _build_offset_sub...
}
Have a look at KiokuDB, its designed with and for Moose so it should really cover all the corners (NB. I haven't tried it myself but I keep meaning to!)
/I3az/
I believe Data::Dump::Streamer can serialize coderefs. Haven't used it myself though.