Writing simple parser in Perl: having lexer output, where to go next? - perl

I'm trying to write a simple data manipulation language in Perl (read-only, it's meant to transform SQL-inspired queries into filters and properties to use with vSphere Perl API: http://pubs.vmware.com/vsphere-60/topic/com.vmware.perlsdk.pg.doc/viperl_advancedtopics.5.1.html_)
I currently have something similar to lexer output if I understand it properly - a list of tokens like this (Data::Dumper prints array of hashes):
$VAR1 = {
'word' => 'SHOW',
'part' => 'verb',
'position' => 0
};
$VAR2 = {
'part' => 'bareword',
'word' => 'name,',
'position' => 1
};
$VAR3 = {
'word' => 'cpu,',
'part' => 'bareword',
'position' => 2
};
$VAR4 = {
'word' => 'ram',
'part' => 'bareword',
'position' => 3
};
Now what I'd like to do is to build a syntax tree. The documentation I've seen so far is mostly on using modules and generating grammars from BNF, but at the moment I can't wrap my head around it.
I'd like to tinker with relatively simple procedural code, probably recursive, to make some ugly implementation myself.
What I'm currently thinking about is building a string of $token->{'part'}s like this:
my $parts = 'verb bareword bareword ... terminator';
and then running a big and ugly regular expression against it, (ab)using Perl's capability to embed code into regular expressions: http://perldoc.perl.org/perlretut.html#A-bit-of-magic:-executing-Perl-code-in-a-regular-expression:
$parts =~ /
^verb(?{ do_something_smart })\s # Statement always starts with a verb
(bareword\s(?{ do_something_smart }))+ # Followed by one or more barewords
| # Or
# Other rules duct taped here
/x;
Whatever I've found so far requires solid knowledge of CS and/or linguistics, and I'm failing to even understand it.
What should I do about lexer output to start understanding and tinker with proper parsing? Something like 'build a set of temporary hashes representing smaller part of statement' or 'remove substrings until the string is empty and then validate what you get'.
I'm aware of the Dragon Book and SICP, but I'd like something lighter at this time.
Thanks!

As mentioned in a couple of comments above, but here again as a real answer:
You might like Parser::MGC. (Disclaimer: I'm the author of Parser::MGC)
Start by taking your existing (regexp?) definitions of various kinds of token, and turn them into "token_..." methods by using the generic_token method.
From here, you can start to build up methods to parse larger and larger structures of your grammar, by using the structure-building methods.
As for actually building an AST - it's possibly simplest to start with to simply emit HASH references with keys containing named parts of your structure. It's hard to tell a grammatical structure from your example given in the question, but you might for instance have a concept of a "command" that is a "verb" followed by some "nouns". You might parse that using:
sub parse_command
{
my $self = shift;
my $verb = $self->token_verb;
my $nouns = $self->sequence_of( sub { $self->token_noun } );
# $nouns here will be an ARRAYref
return { type => "command", verb => $verb, nouns => $nouns };
}
It's usually around this point in writing a parser that I decide I want some actual typed objects instead of mere hash references. One easy way to do this is via another of my modules, Struct::Dumb:
use Struct::Dumb qw( -named_constructors );
struct Command => [qw( verb nouns )];
...
return Command( verb => $verb, nouns => $nouns );

Related

How to fetch values that are hard coded in a Perl subroutine?

I have a perl code like this:
use constant OPERATING_MODE_MAIN_ADMIN => 'super_admin';
use constant OPERATING_MODE_ADMIN => 'admin';
use constant OPERATING_MODE_USER => 'user';
sub system_details
{
return {
operating_modes => {
values => [OPERATING_MODE_MAIN_ADMIN, OPERATING_MODE_ADMIN, OPERATING_MODE_USER],
help => {
'super_admin' => 'The system displays the settings for super admin',
'admin' => 'The system displays settings for normal admin',
'user' => 'No settings are displayed. Only user level pages.'
}
},
log_level => {
values => [qw(FATAL ERROR WARN INFO DEBUG TRACE)],
help => "http://search.cpan.org/~mschilli/Log-Log4perl-1.49/lib/Log/Log4perl.pm#Log_Levels"
},
};
}
How will I access the "value" fields and "help" fields of each key from another subroutine? Suppose I want the values of operating_mode alone or log_level alone?
The system_details() returns a hashref, which has two keys with values being hashrefs. So you can dereference the sub's return and assign into a hash, and then extract what you need
my %sys = %{ system_details() };
my #loglevel_vals = #{ $sys{log_level}->{values} };
my $help_msg = $sys{log_level}->{help};
The #loglevel_vals array contains FATAL, ERROR etc, while $help_msg has the message string.
This makes an extra copy of a hash while one can work with a reference, as in doimen's answer
my $sys = system_details();
my #loglevel_vals = #{ $sys->{log_level}->{values} };
But as the purpose is to interrogate the data in another sub it also makes sense to work with a local copy, what is generally safer (against accidentally changing data in the caller).
There are modules that help with deciphering complex data structures, by displaying them. This helps devising ways to work with data. Often quoted is Data::Dumper, which also does more than show data. Some of the others are meant to simply display the data. A couple of nice ones are Data::Dump and Data::Printer.
my $sys = system_details;
my $log_level = $sys->{'log_level'};
my #values = #{ $log_level->{'values'} };
my $help = $log_level->{'help'};
If you need to introspect the type of structure stored in help (for example help in operating_mode is a hash, but in log_level it is a string), use the ref builtin func.

Attempt to access upserted_id property in perl MongoDB Driver returns useless HASH(0x3572074)

I have a Perl script that pulls a table from a SQL database ($row variable) and attempts to do a MongoDB update like so:
my $res = $users->update({"meeting_id" => $row[0]},
{'$set' => {
"meeting_id" => $row[0],
"case_id" => $row[1],
"case_desc" => $row[2],
"date" => $row[3],
"start_time" => $row[4],
"end_time" => $row[5],
#"mediator_LawyerID" => $row[6],
"mediator_LawyerIDs" => \#medLawIds,
"case_number" => $row[6],
"case_name" => $row[7],
"location" => $row[8],
"number_of_parties" => $row[9],
"case_manager" => $row[10],
"last_updated" => $row[11],
"meeting_result" => $row[12],
"parties" => \#partyList
}},
{'upsert' => 1}) or die "I ain't update!!!";
My client now wants ICS style calendar invites sent to their mediators. Thus, I need to know whether an update or insert happened. The documentation for MongoDB::UpdateResult implies that this is how you access such a property:
my $id = $res->upserted_id;
So I tried:
bless ($res,"MongoDB::UpdateResult");
my $id = $res->upserted_id;
After this code $id is like:
HASH(0x356f8fc)
Are these the actual IDs? If so, how do I convert to a hexadecimal string that can be cast to Mongo's ObjectId type? It should be noted I know absolutely nothing about perl; if more of the code is relevant, at request I will post any section ASAP. Its 300 lines so I didn't want to include the whole file off the bat.
EDIT: I should mention before anyone suggests this that using update_one instead of update returns the exact same result.
HASH(0x356f8fc) is a Perl Hash reference. It's basically some kind of (internal) memory address of some data.
The easiest way to get the contents is Data::Dumper:
use Data::Dumper
[...]
my $result = $res->upserted_id;
print Dumper($result);
HASH(0x356f8fc) is just the human readable representation of the real pointer. You must dump it in the same process and can't pass it from one to another.
You'll probably end up with something like
`my $id = $result->{_id};`
See the PerlRef manpage for details.
See also the MongoDB documentation about write concern.
PS: Also remember that you could use your own IDs for MongoDB. You don't need to work with the generated ones.

Best way to check for incorrect hash key input

In my Perl script, I have subroutine that is called hundreds of times with as many different sets of parameters, as the only values that are sent in are ones that differ from the defaults. (It goes without saying that the number of permutations and combinations is very large) To make it more robust, I would like to do some checking on the parameters. Here is a shrunken version of my subroutine (the actual version has dozens of parameters with very specific, sometimes lengthy names):
# Obtain any parameters that differ from the defaults and send for processing
sub importantSub
{
my %params =
(
commandType => 5,
commandId => 38,
channel1Enable => 0,
channel2Enable => 0,
channel3Enable => 0,
channel4Enable => 0,
channel5Enable => 0,
channel6Enable => 0,
channel7Enable => 0,
channel8Enable => 0,
channel9Enable => 0,
channel10Enable => 0,
# This goes on for a VERY long time
#_
);
# Make sure we have exactly as many keys as we expect - verify that
# no additional parameters were added (Real version has 92)
if( keys(%params) != 92 )
{
croak("Unexpected parameter in hash!");
}
return &$privateProcessingFunction('Data Path Configuration', \%params);
}
As you can see, I currently do a check to see if the number of values is the same, as if something is sent in as "chan1Enable" instead of "channel1Enable", it will throw that number off.
But with so many calls to the subroutine from multiple other scripts written by multiple other engineers, I would like to find a way to find WHICH value was incorrect (e.g. Don't just say that there was an unexpected parameter, say that "chan1Enable" was invalid). Furthermore, if multiple values were incorrect, I'd like to list all of them.
What is the most efficient way to do this?
(I ask about efficiency since the function is currently called in over 400 different ways and that will likely continue to grow as the application expands.)
There are two kinds of errors: supplying an unrecognized parameter, or failing to supply a recognized parameter. You'll have to worry about the second issue as you edit the list of parameters and make sure that the new parameters are used consistently throughout the application.
The best and easiest solution is to use another hash.
my #params = qw(commandType commandId channel1Enabled ...);
my %copy = %params;
my #validation_errors = ();
# are all the required parameters present?
foreach my $param (#params) {
if (not exists $copy{$param}) {
push #validation_errors, "Required param '$param' is missing.";
}
delete $copy{$param};
}
# since we have delete'd all the recognized parameters,
# anything left is unrecognized
foreach my $param (keys %copy) {
push #validation_errors, "Unrecognized param '$param' = '$copy{$param}' in input.";
}
if (#validation_errors) {
die "errors in input:\n", join("\n", #validation_errors);
}
I recommend using a formal tool to help validate your parameters your passing in. Params::Validate is tried and true, while Type::Params is a recent take on the problem space, allowing you to use same set of constraints that you would also use with Moo or Moose.
Here's the kind of diagnostic that Params::Validate would give you for
an unrecognized parameter:
use Params::Validate ':all';
sub foo {
my %p = validate(#_, {
first_required => 1,
second_required => 1,
first_optional => 0.
});
}
foo( boom => 'zoom' );
Results in:
The following parameter was passed in the call to main::foo but was not listed in the validation options: boom
at /tmp/t.pl line 7
main::foo('boom', 'zoom') called at /tmp/t.pl line 14

How to build a hashref with arrays in perl?

I am having trouble building what i think is a hashref (href) in perl with XML::Simple.
I am new to this so not sure how to go about it and i cant find much to build this href with arrays. All the examples i have found are for normal href.
The code bellow outputs the right xml bit, but i am really struggling on how to add more to this href
Thanks
Dario
use XML::Simple;
$test = {
book => [
{
'name' => ['createdDate'],
'value' => [20141205]
},
{
'name' => ['deletionDate'],
'value' => [20111205]
},
]
};
$test ->{book=> [{'name'=> ['name'],'value'=>['Lord of the rings']}]};
print XMLout($test,RootName=>'library');
To add a new hash to the arrary-ref 'books', you need to cast the array-ref to an array and then push on to it. #{ $test->{book} } casts the array-ref into an array.
push #{ $test->{book} }, { name => ['name'], value => ['The Hobbit'] };
XML::Simple is a pain because you're never sure whether you need an array or a hash, and it is hard to distinguish between elements and attributes.
I suggest you make a move to XML::API. This program demonstrates some how it would be used to create the same XML data as your own program that uses XML::Simple.
It has an advantage because it builds a data structure in memory that properly represents the XML. Data can be added linearly, like this, or you can store bookmarks within the structure and go back and add information to nodes created previously.
This code adds the two book elements in different ways. The first is the standard way, where the element is opened, the name and value elements are added, and the book element is closed again. The second shows the _ast (abstract syntax tree) method that allows you to pass data in nested arrays similar to those in XML::Simple for conciseness. This structure requires you to prefix attribute names with a hyphen - to distinguish them from element names.
use strict;
use warnings;
use XML::API;
my $xml = XML::API->new;
$xml->library_open;
$xml->book_open;
$xml->name('createdDate');
$xml->value('20141205');
$xml->book_close;
$xml->_ast(book => [
name => 'deletionDate',
value => '20111205',
]);
$xml->library_close;
print $xml;
output
<?xml version="1.0" encoding="UTF-8" ?>
<library>
<book>
<name>createdDate</name>
<value>20141205</value>
</book>
<book>
<name>deletionDate</name>
<value>20111205</value>
</book>
</library>

Succinct MooseX::Declare method signature validation errors

I've been a proponent of adopting Moose (and MooseX::Declare) at work for several months. The style it encourages will really help the maintainability of our codebase, but not without some initial cost of learning new syntax, and especially in learning how to parse type validation errors.
I've seen discussion online of this problem, and thought I'd post a query to this community for:
a) known solutions
b) discussion of what validation error messages should look like
c) propose a proof of concept that implements some ideas
I'll also contact the authors, but I've seen some good discussion this forum too, so I thought I'd post something public.
#!/usr/bin/perl
use MooseX::Declare;
class Foo {
has 'x' => (isa => 'Int', is => 'ro');
method doit( Int $id, Str :$z, Str :$y ) {
print "doit called with id = " . $id . "\n";
print "z = " . $z . "\n";
print "y = " . $y . "\n";
}
method bar( ) {
$self->doit(); # 2, z => 'hello', y => 'there' );
}
}
my $foo = Foo->new( x => 4 );
$foo->bar();
Note the mismatch in the call to Foo::doit with the method's signature.
The error message that results is:
Validation failed for 'MooseX::Types::Structured::Tuple[MooseX::Types::Structured::Tuple[Object,Int],MooseX::Types::Structured::Dict[z,MooseX::Types::Structured::Optional[Str],y,MooseX::Types::Structured::Optional[Str]]]' failed with value [ [ Foo=HASH(0x2e02dd0) ], { } ], Internal Validation Error is: Validation failed for 'MooseX::Types::Structured::Tuple[Object,Int]' failed with value [ Foo{ x: 4 } ] at /usr/local/share/perl/5.10.0/MooseX/Method/Signatures/Meta/Method.pm line 441
MooseX::Method::Signatures::Meta::Method::validate('MooseX::Method::Signatures::Meta::Method=HASH(0x2ed9dd0)', 'ARRAY(0x2eb8b28)') called at /usr/local/share/perl/5.10.0/MooseX/Method/Signatures/Meta/Method.pm line 145
Foo::doit('Foo=HASH(0x2e02dd0)') called at ./type_mismatch.pl line 15
Foo::bar('Foo=HASH(0x2e02dd0)') called at ./type_mismatch.pl line 20
I think that most agree that this is not as direct as it could be. I've implemented a hack in my local copy of MooseX::Method::Signatures::Meta::Method that yields this output for the same program:
Validation failed for
'[[Object,Int],Dict[z,Optional[Str],y,Optional[Str]]]' failed with value [ [ Foo=HASH(0x1c97d48) ], { } ]
Internal Validation Error:
'[Object,Int]' failed with value [ Foo{ x: 4 } ]
Caller: ./type_mismatch.pl line 15 (package Foo, subroutine Foo::doit)
The super-hacky code that does this is
if (defined (my $msg = $self->type_constraint->validate($args, \$coerced))) {
if( $msg =~ /MooseX::Types::Structured::/ ) {
$msg =~ s/MooseX::Types::Structured:://g;
$msg =~ s/,.Internal/\n\nInternal/;
$msg =~ s/failed.for./failed for\n\n /g;
$msg =~ s/Tuple//g;
$msg =~ s/ is: Validation failed for/:/;
}
my ($pkg, $filename, $lineno, $subroutine) = caller(1);
$msg .= "\n\nCaller: $filename line $lineno (package $pkg, subroutine $subroutine)\n";
die $msg;
}
[Note: With a few more minutes of crawling the code, it looks like MooseX::Meta::TypeConstraint::Structured::validate is a little closer to the code that should be changed. In any case, the question about the ideal error message, and whether anyone is actively working on or thinking about similar changes stands.]
Which accomplishes 3 things:
1) Less verbose, more whitespace (I debated including s/Tuple//, but am sticking with it for now)
2) Including calling file/line (with brittle use of caller(1))
3) die instead of confess -- since as I see it the main advantage of confess was finding the user's entry point into the typechecking anyway, which we can achieve in less verbose ways
Of course I don't actually want to support this patch. My question is: What is the best way of balancing completeness and succinctness of these error messages, and are there any current plans to put something like this in place?
I'm glad you like MooseX::Declare. However, the method signature validation
errors you're talking about aren't really from there, but from
MooseX::Method::Signatures, which in turn uses MooseX::Types::Structured for
its validation needs. Every validation error you currently see comes unmodified
from MooseX::Types::Structured.
I'm also going to ignore the stack-trace part of the error message. I happen to
find them incredibly useful, and so does the rest of Moose cabal. I'm not going
to removed them by default.
If you want a way to turn them off, Moose needs to be changed to throw exception
objects instead of strings for type-constraint validation errors and possibly
other things. Those could always capture a backtrace, but the decision on
whether or not to display it, or how exactly to format it when displaying, could
be made elsewhere, and the user would be free to modify the default behaviour -
globally, locally, lexically, whatever.
What I'm going to address is building the actual validation error messages for
method signatures.
As pointed out, MooseX::Types::Structured does the actual validation
work. When something fails to validate, it's its job to raise an exception. This
exception currently happens to be a string, so it's not all that useful when
wanting to build beautiful errors, so that needs to change, similar to the issue
with stack traces above.
Once MooseX::Types::Structured throws structured exception objects, which might
look somewhat like
bless({
type => Tuple[Tuple[Object,Int],Dict[z,Optional[Str],y,Optional[Str]]],
err => [
0 => bless({
type => Tuple[Object,Int],
err => [
0 => undef,
1 => bless({
type => Int,
err => bless({}, 'ValidationError::MissingValue'),
}, 'ValidationError'),
],
}, 'ValidationError::Tuple'),
1 => undef,
],
}, 'ValidationError::Tuple')
we would have enough information available to actually correlate individual
inner validation errors with parts of the signature in MooseX::Method::Signatures. In the above example, and
given your (Int $id, Str :$z, Str :$y) signature, it'd be easy enough to know
that the very inner Validation::MissingValue for the second element of the
tuple for positional parameters was supposed to provide a value for $id, but
couldn't.
Given that, it'll be easy to generate errors such as
http://files.perldition.org/err1.png
or
http://files.perldition.org/err2.png
which is kind of what I'm going for, instead of just formatting the horrible
messages we have right now more nicely. However, if one wanted to do that, it'd
still be easy enough once we have structured validation exceptions instead of
plain strings.
None of this is actually hard - it just needs doing. If anyone feels like helping
out with this, come talk to us in #moose on irc.perl.org.
Method::Signatures::Modifiers is a package which hopes to fix some of the problems of MooseX::Method::Signatures. Simply use it to overload.
use MooseX::Declare;
use Method::Signatures::Modifiers;
class Foo
{
method bar (Int $thing) {
# this method is declared with Method::Signatures instead of MooseX::Method::Signatures
}
}