XML::Twig : how to specify a twig_handler for a <_tag> which starts with an underscore? - perl

I have an XML file whose root element tag is <__> (two underscores). When, however, that tag name is used in the twig_handlers list XML::Twig->new dies with the error message:
unrecognized expression in handler: '__'
Actually, ANY tag starting with an underscore produces this error except for Twig's special tags _all_ and _default_, either of which I can use to process the file at the expense of throwing away all the handler callbacks except the last.
The invocation which fails is:
XML::Twig->new (twig_handlers => { '__' => \&show })
I imagine there's an XML::Twig Xpath expression which can be used here but the CPAN documentaton is pretty vague about their syntax. I also now wonder what I'd have to do to get at an element <_all_> :)
If anyone has a suggestion it would be much appreciated.
The problem only occurs when the twig is created since once processing has started (using the callback expression _all_), <__> elements at any level in the input are processed normally.
If anyone wants to play with the problem, here's the program I was using to try finding a solution. Set $xpath to the expression you want to test.
use strict;
use XML::Twig;
my $xpath = '_all_'; # <---- fails if one puts '__' here
my $xml = <<EOS; # <---- here's the XML data to process
<__>
<AA>first</AA>
<__>second</__>
</__>
EOS
sub show {
print "handler called for element ", $_->gi, ", whose children are\n";
my #children = $_->children;
for my $elt (#children) {
print "\t", $elt->gi, " holds \"", $elt->text, "\"\n";
}
1;
}
my $twig = XML::Twig->new (twig_handlers => { $xpath => \&show });
$twig->parse ($xml);

Which version of XML::Twig are you using? This is a bug that was fixed in version 3.38.
From the Changes file:
version 3.38
date: 2011-02-27
# minor maintenance release
fixed: RT 65865: _ should be allowed at the start on an XML name
https://rt.cpan.org/Ticket/Display.html?id=65865
reported by Steve Prokopowich
And indeed when I use '__' as the value for $xpath the code runs without errors, and gives the correct output.

Related

Writing simple parser in Perl: having lexer output, where to go next?

I'm trying to write a simple data manipulation language in Perl (read-only, it's meant to transform SQL-inspired queries into filters and properties to use with vSphere Perl API: http://pubs.vmware.com/vsphere-60/topic/com.vmware.perlsdk.pg.doc/viperl_advancedtopics.5.1.html_)
I currently have something similar to lexer output if I understand it properly - a list of tokens like this (Data::Dumper prints array of hashes):
$VAR1 = {
'word' => 'SHOW',
'part' => 'verb',
'position' => 0
};
$VAR2 = {
'part' => 'bareword',
'word' => 'name,',
'position' => 1
};
$VAR3 = {
'word' => 'cpu,',
'part' => 'bareword',
'position' => 2
};
$VAR4 = {
'word' => 'ram',
'part' => 'bareword',
'position' => 3
};
Now what I'd like to do is to build a syntax tree. The documentation I've seen so far is mostly on using modules and generating grammars from BNF, but at the moment I can't wrap my head around it.
I'd like to tinker with relatively simple procedural code, probably recursive, to make some ugly implementation myself.
What I'm currently thinking about is building a string of $token->{'part'}s like this:
my $parts = 'verb bareword bareword ... terminator';
and then running a big and ugly regular expression against it, (ab)using Perl's capability to embed code into regular expressions: http://perldoc.perl.org/perlretut.html#A-bit-of-magic:-executing-Perl-code-in-a-regular-expression:
$parts =~ /
^verb(?{ do_something_smart })\s # Statement always starts with a verb
(bareword\s(?{ do_something_smart }))+ # Followed by one or more barewords
| # Or
# Other rules duct taped here
/x;
Whatever I've found so far requires solid knowledge of CS and/or linguistics, and I'm failing to even understand it.
What should I do about lexer output to start understanding and tinker with proper parsing? Something like 'build a set of temporary hashes representing smaller part of statement' or 'remove substrings until the string is empty and then validate what you get'.
I'm aware of the Dragon Book and SICP, but I'd like something lighter at this time.
Thanks!
As mentioned in a couple of comments above, but here again as a real answer:
You might like Parser::MGC. (Disclaimer: I'm the author of Parser::MGC)
Start by taking your existing (regexp?) definitions of various kinds of token, and turn them into "token_..." methods by using the generic_token method.
From here, you can start to build up methods to parse larger and larger structures of your grammar, by using the structure-building methods.
As for actually building an AST - it's possibly simplest to start with to simply emit HASH references with keys containing named parts of your structure. It's hard to tell a grammatical structure from your example given in the question, but you might for instance have a concept of a "command" that is a "verb" followed by some "nouns". You might parse that using:
sub parse_command
{
my $self = shift;
my $verb = $self->token_verb;
my $nouns = $self->sequence_of( sub { $self->token_noun } );
# $nouns here will be an ARRAYref
return { type => "command", verb => $verb, nouns => $nouns };
}
It's usually around this point in writing a parser that I decide I want some actual typed objects instead of mere hash references. One easy way to do this is via another of my modules, Struct::Dumb:
use Struct::Dumb qw( -named_constructors );
struct Command => [qw( verb nouns )];
...
return Command( verb => $verb, nouns => $nouns );

Using Perl and LibXML to obtain sub node values when namespace is used

I have the following XML as an example:
<root xmlns="http://www.plmxml.org/Schemas/PLMXMLSchema" >
<parentNode status="Good">
<A>
<B>
<C id="123" >C Node Value Here</C>
</B>
</A>
</parentNode>
</root>
There are multiple parentNode nodes in my XML file (only one shown here), so I am cycling through parentNode's. Once I have one, I want to obtain attribute values 3 more levels down in the XML. My XML uses a name space and I have registed the name space in my Perl script as "plm". I can obtain the parentNode attribute value just fine using name space in my path. But when I try to navigate down to node "C" and pickup attribute "id", I am getting the following error:
XPath error : Undefined namespace prefix
error : xmlXPathCompiledEval: evaluation failed
I am using the following Perl script.
use XML::LibXML;
use XML::LibXML::XPathContext;
my $filename = "namespaceissue.xml";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($filename);
my $xc = XML::LibXML::XPathContext->new( $doc->documentElement() );
$xc->registerNs('plm', 'http://www.plmxml.org/Schemas/PLMXMLSchema');
foreach my $node ($xc->findnodes('/plm:root/plm:parentNode')) {
my $status = $node->findvalue('./#status');
print "Status = $status\n";
my $val = $node->findvalue('./plm:A/plm:B/plm:C/#title');
print "Value = $val\n";
}
If I use no namespace on the sub-nodes ./A/B/C, the script continues with no error, but no value is assigned to $val. If I add the plm: prefix I get the namespace error. Does anybody know what I am doing wrong here? Do I have to use findnodes to first find the subnodes and then extract the value with findvalue? I tried that as well and did not have any luck.
$node->findvalue('./plm:A/plm:B/plm:C/#title')
should be
$xc->findvalue('./plm:A/plm:B/plm:C/#id', $node)
Tips:
Those leading ./ are useless.
$node->findvalue('./#status')
$xc->findvalue('./plm:A/plm:B/plm:C/#id', $node)
are the same as
$node->findvalue('#status')
$xc->findvalue('plm:A/plm:B/plm:C/#id', $node)
You can use getAttribute to get an element's attribute, so
$node->findvalue('#status')
can also be accomplished more efficiently using
$node->getAttribute('status')

Perl -- 'Not a HASH reference' error when using JSON::RPC::Client

I'm a newbie in Perl.
I have a JSON-RPC server running at http://localhost:19000 and I need to call checkEmail() method.
use JSON::RPC::Client;
my $client = new JSON::RPC::Client;
my $url = 'http://localhost:19000';
my $callobj = {
method => 'checkEmail',
params => [ 'rprikhodchenko#gmail.com' ],
};
my $res = $client->call($url, $callobj);
if($res) {
if ($res->is_error) {
print "Error : ", $res->error_message;
}
else {
print $res->result;
}
}
else {
print $client->status_line;
}
When I try to launch it it tells following:
perl ./check_ac.pl
Not a HASH reference at /usr/local/share/perl/5.10.1/JSON/RPC/Client.pm line 193.
UPD:
Full stack-trace:
perl -MCarp::Always ./check_ac.pl
Not a HASH reference at /usr/local/share/perl/5.10.1/JSON/RPC/Client.pm line 193
JSON::RPC::ReturnObject::new('JSON::RPC::ReturnObject', 'HTTP::Response=HASH(0x9938d48)', 'JSON=SCALAR(0x96f1518)') called at /usr/local/share/perl/5.10.1/JSON/RPC/Client.pm line 118
JSON::RPC::Client::call('JSON::RPC::Client=HASH(0x944a818)', 'http://localhost:19000', 'HASH(0x96f1578)') called at ./check_ac.pl line 11
This error means that your JSON-RPC server is not actually one, inasmuch as it does not satisfy requirement 7.3. The error is triggered when JSON::RPC::Client assumes the document returned by the JSON-RPC service is well-formed (i.e., a JSON Object), and this assumptions turns out to have been in error. A bug report to the author of JSON::RPC::Client would be an appropriate way to request better error messaging.
I would attack this sort of problem by finding out what the server was returning that was causing JSON::RPC::Client to choke. Unfortunately, JRC fails to provide adequate hookpoints for finding this out, so you'll have to be a little bit tricky.
I don't like editing external libraries, so I recommend an extend-and-override approach to instrumenting traffic with the JSON-RPC server. Something like this (in check_ac.pl):
use Data::Dumper qw();
package JSON::RPC::InstrumentedClient;
use base 'JSON::RPC::Client';
# This would be better done with Module::Install, but I'm limiting dependencies today.
sub _get {
my ($self, #args) = #_;
return $self->_dump_response($self->SUPER::_get(#args));
}
sub _post {
my ($self, #args) = #_;
return $self->_dump_response($self->SUPER::_post(#args));
}
sub _dump_response {
my ($self, $response) = #_;
warn Data::Dumper::Dump([$response->decoded_content], [qw(content)]);
return $response;
}
package main;
my $client = JSON::RPC::InstrumentedClient->new();
my $url = 'http://localhost:19000';
... # rest of check_ac.pl
This wraps the calls to _get and _post that JSON::RPC::Client makes internally in such a way as to let you examine what the web server actually said in response to the request we made. The above code dumps the text content of the page; this might not be the right thing in your case and will blow up if an error is encountered. It's a debugging aid only, to help you figure out from the client code side what is wrong with the server.
That's enough caveats for now, I think. Good luck.
It seems to be a bug in method new of JSON::RPC::ReturnObject.
sub new {
my ($class, $obj, $json) = #_;
my $content = ( $json || JSON->new->utf8 )->decode( $obj->content );
#...
# line 193
$content->{error} ? $self->is_success(0) : $self->is_success(1);
#...
}
$content's value will be something returned from a JSON::decode() call. But looking at the documentation, it seems that JSON->decode() returns a scalar which could be a number, a string, an array reference, or a hash reference.
Unfortunately, JSON::RPC::ReturnObject->new() doesn't check what sort of thing JSON->decode() returned before trying to access it as a hashref. Given your error, I'm going to go ahead and assume what it got in your case was not one. :-)
I don't know if there's a way to force a fix from your code. I'd recommend contacting the author and letting him know about the issue, and/or filing a bug.

Succinct MooseX::Declare method signature validation errors

I've been a proponent of adopting Moose (and MooseX::Declare) at work for several months. The style it encourages will really help the maintainability of our codebase, but not without some initial cost of learning new syntax, and especially in learning how to parse type validation errors.
I've seen discussion online of this problem, and thought I'd post a query to this community for:
a) known solutions
b) discussion of what validation error messages should look like
c) propose a proof of concept that implements some ideas
I'll also contact the authors, but I've seen some good discussion this forum too, so I thought I'd post something public.
#!/usr/bin/perl
use MooseX::Declare;
class Foo {
has 'x' => (isa => 'Int', is => 'ro');
method doit( Int $id, Str :$z, Str :$y ) {
print "doit called with id = " . $id . "\n";
print "z = " . $z . "\n";
print "y = " . $y . "\n";
}
method bar( ) {
$self->doit(); # 2, z => 'hello', y => 'there' );
}
}
my $foo = Foo->new( x => 4 );
$foo->bar();
Note the mismatch in the call to Foo::doit with the method's signature.
The error message that results is:
Validation failed for 'MooseX::Types::Structured::Tuple[MooseX::Types::Structured::Tuple[Object,Int],MooseX::Types::Structured::Dict[z,MooseX::Types::Structured::Optional[Str],y,MooseX::Types::Structured::Optional[Str]]]' failed with value [ [ Foo=HASH(0x2e02dd0) ], { } ], Internal Validation Error is: Validation failed for 'MooseX::Types::Structured::Tuple[Object,Int]' failed with value [ Foo{ x: 4 } ] at /usr/local/share/perl/5.10.0/MooseX/Method/Signatures/Meta/Method.pm line 441
MooseX::Method::Signatures::Meta::Method::validate('MooseX::Method::Signatures::Meta::Method=HASH(0x2ed9dd0)', 'ARRAY(0x2eb8b28)') called at /usr/local/share/perl/5.10.0/MooseX/Method/Signatures/Meta/Method.pm line 145
Foo::doit('Foo=HASH(0x2e02dd0)') called at ./type_mismatch.pl line 15
Foo::bar('Foo=HASH(0x2e02dd0)') called at ./type_mismatch.pl line 20
I think that most agree that this is not as direct as it could be. I've implemented a hack in my local copy of MooseX::Method::Signatures::Meta::Method that yields this output for the same program:
Validation failed for
'[[Object,Int],Dict[z,Optional[Str],y,Optional[Str]]]' failed with value [ [ Foo=HASH(0x1c97d48) ], { } ]
Internal Validation Error:
'[Object,Int]' failed with value [ Foo{ x: 4 } ]
Caller: ./type_mismatch.pl line 15 (package Foo, subroutine Foo::doit)
The super-hacky code that does this is
if (defined (my $msg = $self->type_constraint->validate($args, \$coerced))) {
if( $msg =~ /MooseX::Types::Structured::/ ) {
$msg =~ s/MooseX::Types::Structured:://g;
$msg =~ s/,.Internal/\n\nInternal/;
$msg =~ s/failed.for./failed for\n\n /g;
$msg =~ s/Tuple//g;
$msg =~ s/ is: Validation failed for/:/;
}
my ($pkg, $filename, $lineno, $subroutine) = caller(1);
$msg .= "\n\nCaller: $filename line $lineno (package $pkg, subroutine $subroutine)\n";
die $msg;
}
[Note: With a few more minutes of crawling the code, it looks like MooseX::Meta::TypeConstraint::Structured::validate is a little closer to the code that should be changed. In any case, the question about the ideal error message, and whether anyone is actively working on or thinking about similar changes stands.]
Which accomplishes 3 things:
1) Less verbose, more whitespace (I debated including s/Tuple//, but am sticking with it for now)
2) Including calling file/line (with brittle use of caller(1))
3) die instead of confess -- since as I see it the main advantage of confess was finding the user's entry point into the typechecking anyway, which we can achieve in less verbose ways
Of course I don't actually want to support this patch. My question is: What is the best way of balancing completeness and succinctness of these error messages, and are there any current plans to put something like this in place?
I'm glad you like MooseX::Declare. However, the method signature validation
errors you're talking about aren't really from there, but from
MooseX::Method::Signatures, which in turn uses MooseX::Types::Structured for
its validation needs. Every validation error you currently see comes unmodified
from MooseX::Types::Structured.
I'm also going to ignore the stack-trace part of the error message. I happen to
find them incredibly useful, and so does the rest of Moose cabal. I'm not going
to removed them by default.
If you want a way to turn them off, Moose needs to be changed to throw exception
objects instead of strings for type-constraint validation errors and possibly
other things. Those could always capture a backtrace, but the decision on
whether or not to display it, or how exactly to format it when displaying, could
be made elsewhere, and the user would be free to modify the default behaviour -
globally, locally, lexically, whatever.
What I'm going to address is building the actual validation error messages for
method signatures.
As pointed out, MooseX::Types::Structured does the actual validation
work. When something fails to validate, it's its job to raise an exception. This
exception currently happens to be a string, so it's not all that useful when
wanting to build beautiful errors, so that needs to change, similar to the issue
with stack traces above.
Once MooseX::Types::Structured throws structured exception objects, which might
look somewhat like
bless({
type => Tuple[Tuple[Object,Int],Dict[z,Optional[Str],y,Optional[Str]]],
err => [
0 => bless({
type => Tuple[Object,Int],
err => [
0 => undef,
1 => bless({
type => Int,
err => bless({}, 'ValidationError::MissingValue'),
}, 'ValidationError'),
],
}, 'ValidationError::Tuple'),
1 => undef,
],
}, 'ValidationError::Tuple')
we would have enough information available to actually correlate individual
inner validation errors with parts of the signature in MooseX::Method::Signatures. In the above example, and
given your (Int $id, Str :$z, Str :$y) signature, it'd be easy enough to know
that the very inner Validation::MissingValue for the second element of the
tuple for positional parameters was supposed to provide a value for $id, but
couldn't.
Given that, it'll be easy to generate errors such as
http://files.perldition.org/err1.png
or
http://files.perldition.org/err2.png
which is kind of what I'm going for, instead of just formatting the horrible
messages we have right now more nicely. However, if one wanted to do that, it'd
still be easy enough once we have structured validation exceptions instead of
plain strings.
None of this is actually hard - it just needs doing. If anyone feels like helping
out with this, come talk to us in #moose on irc.perl.org.
Method::Signatures::Modifiers is a package which hopes to fix some of the problems of MooseX::Method::Signatures. Simply use it to overload.
use MooseX::Declare;
use Method::Signatures::Modifiers;
class Foo
{
method bar (Int $thing) {
# this method is declared with Method::Signatures instead of MooseX::Method::Signatures
}
}

"Can't call method "dir_path" on an undefined value" when running Mason component on the command line

Greetings,
I'm trying to develop some tests for Mason components which requires running them on the command line instead of the web server. When I try this, I get an error:
perl -MHTML::Mason::Request -MHTML::Mason::Interp -I./lib \
-e '$int = HTML::Mason::Interp->new( data_dir => "/home/friedo/cache", comp_root => "/home/friedo/comps" ); $m = HTML::Mason::Request->new( comp => "/dummy", interp => $int ); $m->comp("/dummy")'
Results in:
Can't call method "dir_path" on an undefined value at lib/HTML/Mason/Request.pm line 1123.
The error is thrown when the call to ->comp is attempted. I can't figure out what's wrong with the configuration. The component is there and appears to be compiled just fine, and it works via Apache.
This is using HTML::Mason 1.35.
Edit: Let's try a bounty for this one. The alternative is me having to dive deep into Mason's guts! :)
Edit again: Thanks very much to David for pointing out the crucial detail that I missed for getting this to work.
This was actually for a test framework that needed to exercise a module that calls some Mason comps -- under normal operation the module is provided with a Mason request object to use for that purpose, but I couldn't get that to work offline. The key was using an Interpreter object instead, so I ended up doing the following, which is a little silly but makes the tests work:
sub _mason_out {
...
my $buf;
if ( $ENV{MASON_TEST} ) {
my $int = HTML::Mason::Interp->new( comp_root => $self->{env}->comp_dir,
out_method => \$buf );
$int->exec( $comp, %args );
} else {
my $m = $self->{mason_object};
$m->comp( { store => \$buf }, $comp, %args );
}
return $buf;
}
I think this fails because your Request object hasn't built a component stack at the point that it is called. Use the Interp->exec() method instead as described in Using Mason from a Standalone Script
perl -MHTML::Mason::Interp -I./lib \
-e 'HTML::Mason::Interp->new( data_dir => "/home/friedo/cache", comp_root => "/home/friedo/comps" )->exec("/dummy")'