Overcoming "local ambiguity: multiple parsing options:" in Rust Macros - macros

I was experimenting with Rust's macro_rules and wanted to make a macro which could parse an HTML like syntax and simply echo the HTML as a string. The below macro gets most of the way there:
macro_rules! html {
() => ("");
($text:tt) => {{
format!("{}", $text)
}};
(<$open:ident>[$($children:tt)*]</$close:ident>$($rest:tt)*) => {{
format!("<{}>{}</{}>{}",
stringify!($open),
html!($($children)*),
stringify!($close),
html!($($rest)*))
}};
}
and then to use the macro:
println!("{}",
html!(
<html>[
<head>[
<title>["Some Title"]</title>
]</head>
<body>[
<h1>["This is a header!"]</h1>
]</body>
]</html>
)
);
However, I would really like to remove the extraneous opening and closing square brackets. I attempt to do that as follows:
macro_rules! html_test {
() => ("");
($text:tt) => {{
format!("{}", $text)
}};
(<$open:ident>$($children:tt)*</$close:ident>$($rest:tt)*) => {{
format!("<{}>{}</{}>{}",
stringify!($open),
html!($($children)*),
stringify!($close),
html!($($rest)*))
}};
}
However, when I go to use this macro:
println!("{}",
html_test!(
<html>
<head>
<title>"Some Title"</title>
</head>
<body>
<h1>"This is a header!"</h1>
</body>
</html>
)
);
I get the error: local ambiguity: multiple parsing options: built-in NTs tt ('children') or 1 other option.
I know the general solution to this error is to add syntax to disambiguate the cases (such as adding the square brackets). Is there any other way around this issue for this specific example? I know using procedural macros would be an extreme solution, but I would prefer to use macro_rules if at all possible.
I realize using a macro to simply get a string containing HTML is overkill, but it was solely for the sake of this questions. Potentially, one could do much more interesting things with the macro such as calling functions to build up a tree representing the HTML structure.

Do you want the macro to actually be usable? Then no. Actually, why even use a macro here at all? No matter what you do, you're going to be fighting the Rust lexer at some point. Just write the HTML in a string literal like:
r##"<html>
<head>
<title>Some Title</title>
</head>
<body>
<h1>This is a header!</h1>
</body>
</html>"##
That or accept that macro input cannot match actual HTML syntax, close tab, move on.
You're still here? Oh, so you don't care about usability or performance? You really want a marginal improvement in syntax, no matter the cost? *rolls up sleeves*
Be careful what you wish for.
You need to use an incremental parser, which allows you to bypass some of the ambiguous parse issues. Rather than trying to match a non-delimited group (which you can't do), you instead recursively match unique prefixes. Doing that leads to:
macro_rules! html_test {
(#soup {$($parts:expr,)*}, [], ) => {
concat!($($parts),*)
};
(#soup $parts:tt, [$head:ident $($stack:ident)*], ) => {
compile_error!(
concat!(
"unexpected end of HTML; the following elements need closing: ",
stringify!($head),
$(",", stringify!($stack),)*
"."
)
)
};
(#soup {$($parts:tt)*}, [$ex_close:ident $($stack:ident)*], </$got_close:ident> $($tail:tt)*) => {
{
macro_rules! cmp {
($ex_close) => {
html_test!(
#soup
{$($parts)* "</", stringify!($ex_close), ">",},
[$($stack)*], $($tail)*
)
};
($got_close) => {
compile_error!(
concat!(
"closing element mismatch: expected `",
stringify!($ex_close),
"`, got `",
stringify!($got_close),
"`"
)
)
};
}
cmp!($got_close)
}
};
(#soup {$($parts:tt)*}, $stack:tt, <img $($tail:tt)*) => {
html_test!(#tag {$($parts)* "<img",}, $stack, $($tail)*)
};
(#soup {$($parts:tt)*}, [$($stack:ident)*], <$open:ident $($tail:tt)*) => {
html_test!(
#tag
{$($parts)* "<", stringify!($open),},
[$open $($stack)*],
$($tail)*
)
};
(#soup {$($parts:tt)*}, $stack:tt, $text:tt $($tail:tt)*) => {
html_test!(#soup {$($parts)* $text,}, $stack, $($tail)*)
};
(#tag {$($parts:tt)*}, $stack:tt, > $($tail:tt)*) => {
html_test!(#soup {$($parts)* ">",}, $stack, $($tail)*)
};
(#tag {$($parts:tt)*}, $stack:tt, $name:ident=$value:tt $($tail:tt)*) => {
html_test!(
#tag
{$($parts)* " ", stringify!($name), "=", stringify!($value),},
$stack, $($tail)*
)
};
($($tts:tt)*) => {
html_test! { #soup {}, [], $($tts)* }
};
}
This works by crawling over the input tokens, keeping track of the string pieces that need to be output (in $($parts)*), and the opened tags that need closing (in $($stack)*). Once it's out of input, and the stack is empty, it concat!s all the parts together, producing a single static string literal.
This has four problems:
This chews through recursion levels like crazy. If you run out, users will need to globally up the recursion limit.
Macros like this are slow.
Error reporting sucks. Although this will check the closing tags match the corresponding opening tags, problems aren't reported at any particular location in the invocation.
You still can't avoid needing to use string literals. You cannot match an expression that is followed by < or another expression, so matching the strings must be the (sole) fallback rule.
So you can remove the delimiters, but I wouldn't recommend it. Just quote the HTML like a sane person.
As an aside, here is an alternative version of the macro with a slightly different structure that factors out the cmp macro, and is easier to extend for elements without closing tags. Note that I did not write this version.

Related

Writing simple parser in Perl: having lexer output, where to go next?

I'm trying to write a simple data manipulation language in Perl (read-only, it's meant to transform SQL-inspired queries into filters and properties to use with vSphere Perl API: http://pubs.vmware.com/vsphere-60/topic/com.vmware.perlsdk.pg.doc/viperl_advancedtopics.5.1.html_)
I currently have something similar to lexer output if I understand it properly - a list of tokens like this (Data::Dumper prints array of hashes):
$VAR1 = {
'word' => 'SHOW',
'part' => 'verb',
'position' => 0
};
$VAR2 = {
'part' => 'bareword',
'word' => 'name,',
'position' => 1
};
$VAR3 = {
'word' => 'cpu,',
'part' => 'bareword',
'position' => 2
};
$VAR4 = {
'word' => 'ram',
'part' => 'bareword',
'position' => 3
};
Now what I'd like to do is to build a syntax tree. The documentation I've seen so far is mostly on using modules and generating grammars from BNF, but at the moment I can't wrap my head around it.
I'd like to tinker with relatively simple procedural code, probably recursive, to make some ugly implementation myself.
What I'm currently thinking about is building a string of $token->{'part'}s like this:
my $parts = 'verb bareword bareword ... terminator';
and then running a big and ugly regular expression against it, (ab)using Perl's capability to embed code into regular expressions: http://perldoc.perl.org/perlretut.html#A-bit-of-magic:-executing-Perl-code-in-a-regular-expression:
$parts =~ /
^verb(?{ do_something_smart })\s # Statement always starts with a verb
(bareword\s(?{ do_something_smart }))+ # Followed by one or more barewords
| # Or
# Other rules duct taped here
/x;
Whatever I've found so far requires solid knowledge of CS and/or linguistics, and I'm failing to even understand it.
What should I do about lexer output to start understanding and tinker with proper parsing? Something like 'build a set of temporary hashes representing smaller part of statement' or 'remove substrings until the string is empty and then validate what you get'.
I'm aware of the Dragon Book and SICP, but I'd like something lighter at this time.
Thanks!
As mentioned in a couple of comments above, but here again as a real answer:
You might like Parser::MGC. (Disclaimer: I'm the author of Parser::MGC)
Start by taking your existing (regexp?) definitions of various kinds of token, and turn them into "token_..." methods by using the generic_token method.
From here, you can start to build up methods to parse larger and larger structures of your grammar, by using the structure-building methods.
As for actually building an AST - it's possibly simplest to start with to simply emit HASH references with keys containing named parts of your structure. It's hard to tell a grammatical structure from your example given in the question, but you might for instance have a concept of a "command" that is a "verb" followed by some "nouns". You might parse that using:
sub parse_command
{
my $self = shift;
my $verb = $self->token_verb;
my $nouns = $self->sequence_of( sub { $self->token_noun } );
# $nouns here will be an ARRAYref
return { type => "command", verb => $verb, nouns => $nouns };
}
It's usually around this point in writing a parser that I decide I want some actual typed objects instead of mere hash references. One easy way to do this is via another of my modules, Struct::Dumb:
use Struct::Dumb qw( -named_constructors );
struct Command => [qw( verb nouns )];
...
return Command( verb => $verb, nouns => $nouns );

Join attempt throwing exceptions

I'm sure I'm overlooking something glaringly obvious and I apologize for the newbie question, but I've spent several hours back and forth through documentation for DBIx::Class and Catalyst and am not finding the answer I need...
What I'm trying to do is automate creation of sub-menus based on the contents of my database. I have three tables in the database to do so: maps (in which sub-menu items are found), menus (contains names of top-level menus), maps_menus (assigns maps to top-level menus). I've written a subroutine to return a hash of resultsets, with the plan of using a Template Toolkit nested loop to build the top-level and sub-menus.
Basically, for each top-level menu in menus, I'm trying to run the following query and (eventually) build a sub-menu based on the result:
select * FROM maps JOIN maps_menus ON maps.id_maps = maps_menus.id_maps WHERE maps_menus.id_menus = (current id_menus);
Here is the subroutine, located in lib/MyApp/Schema/ResultSet/Menus.pm
# Build a hash of hashes for menu generation
sub build_menu {
my ($self, $maps, $maps_menus) = #_;
my %menus;
while (my $row = $self->next) {
my $id = $row->get_column('id_menus');
my $name = $row->get_column('name');
my $sub = $maps_menus->search(
{ 'id_maps' => $id },
{ join => 'maps',
'+select' => ['maps.id_maps'],
'+as' => ['id_maps'],
'+select' => ['maps.name'],
'+as' => ['name'],
'+select' => ['maps.map_file'],
'+as' => ['map_file']
}
);
$menus{$name} = $sub;
# See if it worked...
print STDERR "$name\n";
while (my $m = $sub->next) {
my $m_id = $m->get_column('id_maps');
my $m_name = $m->get_column('name');
my $m_file = $m->get_column('map_file');
print STDERR "\t$m_id, $m_name, $m_file\n";
}
}
return \%menus;
}
I am calling this from lib/MyApp/Controller/Maps.pm thusly...
$c->stash(menus => [$c->model('DB::Menus')->build_menu($c->model('DB::Map'), $c->model('DB::MapsMenus'))]);
When I attempt to pull up the page, I get all sorts of exceptions, the top-most of which is:
[error] No such relationship maps on MapsMenus at /home/catalyst/perl5/lib/perl5/DBIx/Class/Schema.pm line 1078
Which, as far as I can tell, originates from the call to $sub->next. I take this as meaning I'm doing my query incorrectly and not getting the results I think I should be. However, I'm not sure what I'm missing.
I found the following lines, defining the relationship to maps, in lib/MyApp/Schema/Result/MapsMenus.pm
__PACKAGE__->belongs_to(
"id_map",
"MyApp::Schema::Result::Map",
{ id_maps => "id_maps" },
{ is_deferrable => 1, on_delete => "CASCADE", on_update => "CASCADE" },
);
...and in lib/MyApp/Schema/Result/Map.pm
__PACKAGE__->has_many(
"maps_menuses",
"MyApp::Schema::Result::MapsMenus",
{ "foreign.id_maps" => "self.id_maps" },
{ cascade_copy => 0, cascade_delete => 0 },
);
No idea why it's calling it "maps_menuses" -- that was generated by Catalyst. Could that be the problem?
Any help would be greatly appreciated!
I'd suggest using prefetch of the two relationships which form the many-to-many relationship helper and maybe using HashRefInflator if you don't need access to the row objects.
Note that Catalyst doesn't generate a DBIC (which is btw the official abbreviation for DBIx::Class, DBIx is a whole namespace) schema, SQL::Translator or DBIx::Class::Schema::Loader do. Looks at the docs of the module you've used to find out how to influence its naming.
Also feel free to change the names if they don't fit you.

What does `error: expected open delimiter` when calling rust macro mean?

I have such a macro:
macro_rules! expect_token (
([$($token:matchers, $result:expr)|+] <= $tokens:ident, $parsed_tokens:ident, $error:expr) => (
match $tokens.pop() {
$(
Some($token) => {
$parsed_tokens.push($token);
$result
},
)+
None => {
$parsed_tokens.reverse();
$tokens.extend($parsed_tokens.into_iter());
return NotComplete;
},
_ => return error(expr)
}
);
)
when I call it with expect_token!([Ident(name), name] <= tokens, parsed_tokens, "expected function name in prototype"); I get the error "error: expected open delimiter".
What does this error mean and what I am doing wrong?
P.S. If you are wondering what is the definition of identifiers like NotComplete, you can look at https://github.com/jauhien/iron-kaleidoscope/blob/master/src/parser.rs, but it is not relevant for this question as far as I understand, as the problem is not with the macro body, but with its invocation.
Ok, I have found the response: matchers in macros invocation should be enclosed in parenthesis. The problem was in my misunderstanding of matchers as left hand side of match rules, while they are lhs of the => in macro rules, which is clearly stated in documentation.
P.S. What about the whole macros I gave as example it is wrong anyway. )

Zend Form TextArea with countdown characters and StringLength Validator - different calc

In my form i have a zend_form_element_textarea. For this element i add a validator StringLength like this:
$this->addElement('textarea', 'text', array('label' => 'F_MESSAGE_SMS_TEXT', 'required' => true, 'class' => 'fieldLiveLimit', 'limit' => 160));
$this->text->addValidator('StringLength', false, array('max' => 160));
and i use a script javascript for show live countdown characters:
//Text field or text area limit - get limit by field parameter
$(".fieldLiveLimit").each(function () {
var characters = $(this).attr('limit');
var remaining = calcDifference(characters, $(this).val());
if ($('.limitCounter').length > 0) {
$(this).after($('.limitCounter').first().clone());
$(this).next('.limitCounter').children('span').html(remaining);
} else {
$(this).after($("<div class='limitCounter'>" + translate('L_LIMIT_COUNTER', [remaining]) + "</div>"));
}
checkClassCounter(remaining, $(this));
$(this).bind('textchange', function (event, previousText) {
remaining = calcDifference(characters, $(this).val());
checkClassCounter(remaining, $(this));
if ($(this).val().length > characters) {
$(this).val($(this).val().substr(0, characters));
} else {
$(this).next('.limitCounter').children('span').html(remaining);
}
});
function calcDifference(characters, value) {
return characters - parseInt(value.length);
}
function checkClassCounter(remaining, element) {
if (parseInt(element.val().length) == 0) {
element.next(".limitCounter").hide();
} else {
element.next(".limitCounter").show();
if (remaining <= 10) {
element.next(".limitCounter").addClass('red-message');
} else {
element.next(".limitCounter").removeClass('red-message');
}
}
}
});
this works well, except for one thing. If inside the text area there are the new lines, the validator zend the new line it counts as two characters, while my JS script as one.
who is wrong? I think the zend validator, but it seems really strange as a thing and then ask to you!
It has to do with the line breaks, as user Pankrates already pointed out in his comment.
In fact, this problem is a lot more complex than it seems at first, because it has at least two dimensions:
jQuery strips the carriage return character in the val() function: "At present, using .val() on textarea elements strips carriage return characters from the browser-reported value." jQuery documentation.
It is inconsistent along browsers how line breaks with \r\n are counted. See here or here for related questions on SO. However, on all browsers I have installed on my system (Firefox 20.0 and Chrome 26.0), \r\n are counted as two characters, so I cannot confirm this.
See this little code snippet for a demonstration:
<?php
$str1 = "test\nstring";
$str2 = "test\r\nstring";
?>
<textarea id="text1"><?php echo $str1 ?></textarea>jQuery: <span id="jquery1"></span>, JS: <span id="js1"></span>, PHP: <?php echo iconv_strlen($str1) ?>
<textarea id="text2"><?php echo $str2 ?></textarea>jQuery: <span id="jquery2"></span>, JS: <span id="js2"></span>, PHP: <?php echo iconv_strlen($str2) ?>
<script type="text/javascript">
$(document).ready(function() {
$("#jquery1").text($("#text1").val().length);
$("#js1").text("<?php echo str_replace(array("\n", "\r"), array('\n', '\r'), $str1) ?>".length);
$("#jquery2").text($("#text2").val().length);
$("#js2").text("<?php echo str_replace(array("\n", "\r"), array('\n', '\r'), $str2) ?>".length);
});
</script>
For the first box it gives me jQuery: 11, JS: 11, PHP: 11, but for the second box I get jQuery: 11, JS: 12, PHP: 12.
There are several solutions I can think of (none of which is ideal):
Use a Zend_Filter_PregReplace in your form to replace all \r\n with \n. Pro: Counting will be consistent with that of jQuery's val() and relatively easy. Con: You are destroying the user's line break which might lead to unwanted results.
Decorate the Zend_Validate_StringLength so that you can replace \r\n by \n in the isValid() method. Pro: Will preserve the user's line break, Con: You might get a valid result that is longer then 200 characters, because \r\n is counted as one character and you need to introduce a new class.
Use the jQuery's textarea.valHooks to replace all line breaks with \r\n: Pro: Simple, Con: If you have users that have \n as line break char, it again will give you inconsistent results.
I hope this answer show you some directions on how you can tackle this situation depending on your app's context.

Succinct MooseX::Declare method signature validation errors

I've been a proponent of adopting Moose (and MooseX::Declare) at work for several months. The style it encourages will really help the maintainability of our codebase, but not without some initial cost of learning new syntax, and especially in learning how to parse type validation errors.
I've seen discussion online of this problem, and thought I'd post a query to this community for:
a) known solutions
b) discussion of what validation error messages should look like
c) propose a proof of concept that implements some ideas
I'll also contact the authors, but I've seen some good discussion this forum too, so I thought I'd post something public.
#!/usr/bin/perl
use MooseX::Declare;
class Foo {
has 'x' => (isa => 'Int', is => 'ro');
method doit( Int $id, Str :$z, Str :$y ) {
print "doit called with id = " . $id . "\n";
print "z = " . $z . "\n";
print "y = " . $y . "\n";
}
method bar( ) {
$self->doit(); # 2, z => 'hello', y => 'there' );
}
}
my $foo = Foo->new( x => 4 );
$foo->bar();
Note the mismatch in the call to Foo::doit with the method's signature.
The error message that results is:
Validation failed for 'MooseX::Types::Structured::Tuple[MooseX::Types::Structured::Tuple[Object,Int],MooseX::Types::Structured::Dict[z,MooseX::Types::Structured::Optional[Str],y,MooseX::Types::Structured::Optional[Str]]]' failed with value [ [ Foo=HASH(0x2e02dd0) ], { } ], Internal Validation Error is: Validation failed for 'MooseX::Types::Structured::Tuple[Object,Int]' failed with value [ Foo{ x: 4 } ] at /usr/local/share/perl/5.10.0/MooseX/Method/Signatures/Meta/Method.pm line 441
MooseX::Method::Signatures::Meta::Method::validate('MooseX::Method::Signatures::Meta::Method=HASH(0x2ed9dd0)', 'ARRAY(0x2eb8b28)') called at /usr/local/share/perl/5.10.0/MooseX/Method/Signatures/Meta/Method.pm line 145
Foo::doit('Foo=HASH(0x2e02dd0)') called at ./type_mismatch.pl line 15
Foo::bar('Foo=HASH(0x2e02dd0)') called at ./type_mismatch.pl line 20
I think that most agree that this is not as direct as it could be. I've implemented a hack in my local copy of MooseX::Method::Signatures::Meta::Method that yields this output for the same program:
Validation failed for
'[[Object,Int],Dict[z,Optional[Str],y,Optional[Str]]]' failed with value [ [ Foo=HASH(0x1c97d48) ], { } ]
Internal Validation Error:
'[Object,Int]' failed with value [ Foo{ x: 4 } ]
Caller: ./type_mismatch.pl line 15 (package Foo, subroutine Foo::doit)
The super-hacky code that does this is
if (defined (my $msg = $self->type_constraint->validate($args, \$coerced))) {
if( $msg =~ /MooseX::Types::Structured::/ ) {
$msg =~ s/MooseX::Types::Structured:://g;
$msg =~ s/,.Internal/\n\nInternal/;
$msg =~ s/failed.for./failed for\n\n /g;
$msg =~ s/Tuple//g;
$msg =~ s/ is: Validation failed for/:/;
}
my ($pkg, $filename, $lineno, $subroutine) = caller(1);
$msg .= "\n\nCaller: $filename line $lineno (package $pkg, subroutine $subroutine)\n";
die $msg;
}
[Note: With a few more minutes of crawling the code, it looks like MooseX::Meta::TypeConstraint::Structured::validate is a little closer to the code that should be changed. In any case, the question about the ideal error message, and whether anyone is actively working on or thinking about similar changes stands.]
Which accomplishes 3 things:
1) Less verbose, more whitespace (I debated including s/Tuple//, but am sticking with it for now)
2) Including calling file/line (with brittle use of caller(1))
3) die instead of confess -- since as I see it the main advantage of confess was finding the user's entry point into the typechecking anyway, which we can achieve in less verbose ways
Of course I don't actually want to support this patch. My question is: What is the best way of balancing completeness and succinctness of these error messages, and are there any current plans to put something like this in place?
I'm glad you like MooseX::Declare. However, the method signature validation
errors you're talking about aren't really from there, but from
MooseX::Method::Signatures, which in turn uses MooseX::Types::Structured for
its validation needs. Every validation error you currently see comes unmodified
from MooseX::Types::Structured.
I'm also going to ignore the stack-trace part of the error message. I happen to
find them incredibly useful, and so does the rest of Moose cabal. I'm not going
to removed them by default.
If you want a way to turn them off, Moose needs to be changed to throw exception
objects instead of strings for type-constraint validation errors and possibly
other things. Those could always capture a backtrace, but the decision on
whether or not to display it, or how exactly to format it when displaying, could
be made elsewhere, and the user would be free to modify the default behaviour -
globally, locally, lexically, whatever.
What I'm going to address is building the actual validation error messages for
method signatures.
As pointed out, MooseX::Types::Structured does the actual validation
work. When something fails to validate, it's its job to raise an exception. This
exception currently happens to be a string, so it's not all that useful when
wanting to build beautiful errors, so that needs to change, similar to the issue
with stack traces above.
Once MooseX::Types::Structured throws structured exception objects, which might
look somewhat like
bless({
type => Tuple[Tuple[Object,Int],Dict[z,Optional[Str],y,Optional[Str]]],
err => [
0 => bless({
type => Tuple[Object,Int],
err => [
0 => undef,
1 => bless({
type => Int,
err => bless({}, 'ValidationError::MissingValue'),
}, 'ValidationError'),
],
}, 'ValidationError::Tuple'),
1 => undef,
],
}, 'ValidationError::Tuple')
we would have enough information available to actually correlate individual
inner validation errors with parts of the signature in MooseX::Method::Signatures. In the above example, and
given your (Int $id, Str :$z, Str :$y) signature, it'd be easy enough to know
that the very inner Validation::MissingValue for the second element of the
tuple for positional parameters was supposed to provide a value for $id, but
couldn't.
Given that, it'll be easy to generate errors such as
http://files.perldition.org/err1.png
or
http://files.perldition.org/err2.png
which is kind of what I'm going for, instead of just formatting the horrible
messages we have right now more nicely. However, if one wanted to do that, it'd
still be easy enough once we have structured validation exceptions instead of
plain strings.
None of this is actually hard - it just needs doing. If anyone feels like helping
out with this, come talk to us in #moose on irc.perl.org.
Method::Signatures::Modifiers is a package which hopes to fix some of the problems of MooseX::Method::Signatures. Simply use it to overload.
use MooseX::Declare;
use Method::Signatures::Modifiers;
class Foo
{
method bar (Int $thing) {
# this method is declared with Method::Signatures instead of MooseX::Method::Signatures
}
}