Nemerle Custom Operator Problem - macros

What would like to able to write in my code is the following.
c² = a² + b²
To begin with I tried creating a macro for ² first.
I have tried the following.
macro #² (x)
syntax (x,"²")
{
<[
($x * $x)
]>
}
But I get expecting an identifier errors at the (x) So I tried
macro #s (x)
syntax (x,"²")
{
<[
($x * $x)
]>
}
Now I get Unsupported Syntax Token error at the "²".
So I ask
1. is possible to write the Operator ² ?
2. What are the supported Syntax Tokens?

Currently, any character with an ASCII code lower than 255 and the following characters are valid for an operator: '=', '<', '>', '#', '^', '&', '-', '+', '|', '*','/', '$', '%', '!', '?', '~', '.', ':', '#', '\', '`', '(' , ')' , ';' , '[' , ']'.
We can add "²" too, but maybe a more generic approach would be better.

Related

how can I partition a line into code and comment using a single regex in perl?

I want to read through a text file and partition each line into the following three variables. Each variable must be defined, although it might be equal to the empty string.
$a1code: all characters up to and not including the first non-escaped percent sign. If there is no non-escaped percent sign, this is the entire line. As we see in the example below, this also could be the empty string in a line where the following two variables are non-empty.
$a2boundary: the first non-escaped percent sign, if there is one.
$a3cmnt: any characters after the first non-escaped percent sign, if there is one.
The script below accomplishes this but requires several lines of code, two hashes, and a composite regex, that is, 2 regex combined by |.
The composite seems necessary because the first clause,
(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)
does not match a line that is pure code, no comment.
Is there a more elegant way, using a single regex and fewer steps?
In particular, is there a way to dispense with the %match hash and somehow
fill the %+ hash with all three three variables in a single step?
#!/usr/bin/env perl
use strict; use warnings;
print join('', 'perl ', $^V, "\n",);
use Data::Dumper qw(Dumper); $Data::Dumper::Sortkeys = 1;
my $count=0;
while(<DATA>)
{
$count++;
print "$count\t";
chomp;
my %match=(
a2boundary=>'',
a3cmnt=>'',
);
print "|$_|\n";
if($_=~/^(?<a1code>.*?)(?<a2boundary>(?<!\\)%)(?<a3cmnt>.*)|(?<a1code>.*)/)
{
print "from regex:\n";
print Dumper \%+;
%match=(%match,%+,);
}
else
{
die "no match? coding error, should never get here";
}
if(scalar keys %+ != scalar keys %match)
{
print "from multiple lines of code:\n";
print Dumper \%match;
}
print "------------------------------------------\n";
}
__DATA__
This is 100\% text and below you find an empty line.
abba 5\% %comment 9\% %Borgia
%all comment
%
Result:
perl v5.34.0
1 |This is 100\% text and below you find an empty line. |
from regex:
$VAR1 = {
'a1code' => 'This is 100\\% text and below you find an empty line. '
};
from multiple lines of code:
$VAR1 = {
'a1code' => 'This is 100\\% text and below you find an empty line. ',
'a2boundary' => '',
'a3cmnt' => ''
};
------------------------------------------
2 ||
from regex:
$VAR1 = {
'a1code' => ''
};
from multiple lines of code:
$VAR1 = {
'a1code' => '',
'a2boundary' => '',
'a3cmnt' => ''
};
------------------------------------------
3 |abba 5\% %comment 9\% %Borgia|
from regex:
$VAR1 = {
'a1code' => 'abba 5\\% ',
'a2boundary' => '%',
'a3cmnt' => 'comment 9\\% %Borgia'
};
------------------------------------------
4 |%all comment|
from regex:
$VAR1 = {
'a1code' => '',
'a2boundary' => '%',
'a3cmnt' => 'all comment'
};
------------------------------------------
5 |%|
from regex:
$VAR1 = {
'a1code' => '',
'a2boundary' => '%',
'a3cmnt' => ''
};
------------------------------------------
You can use the following:
my ($a1code, $a2boundary, $a3cmnt) =
/
^
( (?: [^\\%]+ | \\. )* )
(?: (%) (.*) )?
\z
/sx;
It does not consider % escaped in abc\\%def since the preceding \ is escaped.
It requires no backtracking, and it always matches.
$a1code is always a string. It can be zero characters long (when the input is an empty string and when % is the first character), or the entire input string (when there is no unescaped %).
However, $a2boundary and $a3cmnt are only defined if there's an unescaped %. In other words, $a2boundary is equivalent to defined($a3cmnt) ? '%' : undef.
Explanation: [^\\%]+ matches non-escaped characters other than \ and %. \\. matches escaped characters. So (?: [^\\%]+ | \\. )* gets us the prefix, or the entire string if there are no unescaped %.
What about cases like this\\%string where the backslash before the percent sign is itself escaped?
Consider something like this, which instead of trying to use a regular expression to split the string into three groups, uses one to look where for it should be split, and substr to do the actual splitting:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub splitter {
my $line = shift;
if ($line =~ /
# Match either
(?<!\\)% # A % not preceded by a backslash
| # or
(?<=[^\\])(?:\\\\)+\K% # Any even number of backslashes followed by a %
/x) {
return (substr($line, 0, $-[0]), '%', substr($line, $+[0]));
} else {
return ($line, '', '');
}
}
while (<DATA>) {
chomp;
# Assign to an array instead of individual scalars for demonstration purposes
my #vals = splitter $_;
print Dumper(\#vals);
}
__DATA__
This is 100\% text and below you find an empty line.
abba 5\% %comment 9\% %Borgia
%all comment
%
a tricky\\%test % case
another \\\%one % to mess with you

Parse single quoted string using Marpa:r2 perl

How to parse single quoted string using Marpa:r2?
In my below code, the single quoted strings appends '\' on parsing.
Code:
use strict;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new(
{ default_action => '[values]',
source => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start ::= Expression
# include begin
Expression ::= Param
Param ::= Unquoted
| ('"') Quoted ('"')
| (') Quoted (')
:discard ~ whitespace
whitespace ~ [\s]+
Unquoted ~ [^\s\/\(\),&:\"~]+
Quoted ~ [^\s&:\"~]+
END_OF_SOURCE
});
my $input1 = 'foo';
#my $input2 = '"foo"';
#my $input3 = '\'foo\'';
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
print "Trying to parse:\n$input1\n\n";
$recce->read(\$input1);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
Output's:
Trying to parse:
foo
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
"foo"
Output:
$VAR1 = [
[
'foo'
]
];
Trying to parse:
'foo'
Output:
$VAR1 = [
[
'\'foo\''
]
]; (don't want it to be parsed like this)
Above are the outputs of all the inputs, i don't want 3rd one to get appended with the '\' and single quotes.. I want it to be parsed like OUTPUT2. Please advise.
Ideally, it should just pick the content between single quotes according to Param ::= (') Quoted (')
The other answer regarding Data::Dumper output is correct. However, your grammar does not work the way you expect it to.
When you parse the input 'foo', Marpa will consider the three Param alternatives. The predicted lexemes at that position are:
Unquoted ~ [^\s\/\(\),&:\"~]+
'"'
') Quoted ('
Yes, the last is literally ) Quoted (, not anything containing a single quote.
Even if it were ([']) Quoted ([']): Due to longest token matching, the Unquoted lexeme will match the entire input, including the single quote.
What would happen for an input like " foo " (with double quotes)? Now, only the '"' lexeme would match, then any whitespace would be discarded, then the Quoted lexeme matches, then any whitespace is discarded, then closing " is matched.
To prevent this whitespace-skipping behaviour and to prevent the Unquoted rule from being preferred due to LATM, it makes sense to describe quoted strings as lexemes. For example:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
These lexemes will then include any quotes and escapes, so you need to post-process the lexeme contents. You can either do this using the event system (which is conceptually clean, but a bit cumbersome to implement), or adding an action that performs this processing during parse evaluation.
Since lexemes cannot have actions, it is usually best to add a proxy production:
Param ::= Unquoted | Quoted
Unquoted ~ [^'"]+
Quoted ::= Quoted_Lexeme action => process_quoted
Quoted_Lexeme ~ DQ | SQ
DQ ~ '"' DQ_Body '"' DQ_Body ~ [^"]*
SQ ~ ['] SQ_Body ['] SQ_Body ~ [^']*
The action could then do something like:
sub process_quoted {
my (undef, $s) = #_;
# remove delimiters from double-quoted string
return $1 if $s =~ /^"(.*)"$/s;
# remove delimiters from single-quoted string
return $1 if $s =~ /^'(.*)'$/s;
die "String was not delimited with single or double quotes";
}
Your result doesn't contain \', it contains '. Dumper merely formats the result like that so it's clear what's inside the string and what isn't.
You can test this behavior for yourself:
use Data::Dumper;
my $tick = chr(39);
my $back = chr(92);
print "Tick Dumper: " . Dumper($tick);
print "Tick Print: " . $tick . "\n";
print "Backslash Dumper: " . Dumper($back);
print "Backslash Print: " . $back . "\n";
You can see a demo here: https://ideone.com/d1V8OE
If you don't want the output to contain single quotes, you'll probably need to remove them from the input yourself.
I am not so familar with Marpa::R2, but could you try to use an action on the Expression rule:
Expression ::= Param action => strip_quotes
Then, implement a simple quote stripper like:
sub MyActions::strip_quotes {
#{$_[1]}[0] =~ s/^'|'$//gr;
}

Exception handling in parser implemented using Marpa::R2

I have implemented a parser using Marpa::R2. Code appears like below:
I have a large number of test cases in a .t file, which i run to test my parser. So, if any exception arises in any of the input expression, testing shouldn't stop in mid and it should give proper error message for the one which has given an error (using exception handling) and rest of the test cases should run.
I want to do exception handling in this parser. If any sort of exception arrises even while tokenizing the input expression, I want to show appropriate message to the user, saying the position, string etc or any more details to show where the error came. Please help.
use strict;
use Marpa::R2;
use Data::Dumper;
my $grammar = Marpa::R2::Scanless::G->new({
default_action => '[values]',
source => \(<<'END_OF_SOURCE'),
lexeme default = latm => 1
:start ::= expression
expression ::= expression OP expression
expression ::= expression COMMA expression
expression ::= func LPAREN PARAM RPAREN
expression ::= PARAM
PARAM ::= STRING | REGEX_STRING
REGEX_STRING ::= '"' QUOTED_STRING '"'
:discard ~ sp
sp ~ [\s]+
COMMA ~ [,]
STRING ~ [^ \/\(\),&:\"~]+
QUOTED_STRING ~ [^ ,&:\"~]+
OP ~ ' - ' | '&'
LPAREN ~ '('
RPAREN ~ ')'
func ~ 'func'
END_OF_SOURCE
});
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
print "Trying to parse:\n$input\n\n";
$recce->read(\$input);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
my $input4 = "func(\"foo\")";
I want to do Proper error handling like :http://blogs.perl.org/users/jeffrey_kegler/2012/10/a-marpa-dsl-tutorial-error-reporting-made-easy.html
I dont know how to put all this stuff in place.
Wrap the lines that can fail in an exception handler:
use Try::Tiny;
⋮
try {
$recce->read(\$input);
my $value_ref = ${$recce->value};
print "Output:\n".Dumper($value_ref);
} catch {
warn $_;
};
The full error message from Marpa will be in $_, it is a single long string with newlines in it. I chose to print it to STDOUT with warn, and the program continues to run. As you can see in an example error message below, it contains the position where the parsing failed:
Error in SLIF parse: No lexeme found at line 1, column 5
* String before error: "fo\s
* The error was at line 1, column 5, and at character 0x006f 'o', ...
* here: o"
Marpa::R2 exception at so49932329.pl line 41.
If you need to, you could reformat it so it looks better to the user.

Perl quotes surrounded by only string in array

I need to place the single quotes other than number in an array.
I tried the following code but it was not working . Can anyone help me to sort it out.
$data = join ',', map { /'\w+'/ } #$row[0..3];
Input/Output :
Input :
[1,string test, value test, 5]
Output:
(1,'string test', 'value test', 5)
To place '' around elements that have not a single digit in them
my $data = join ',', map { /[0-9]/ ? $_ : "'${_}'" } #$row[0..3];
where string 10 test doesn't get quoted.
Or, to leave unquoted only pure integers
my $data = join ',', map { /[^0-9]/ ? "'${_}'" : $_ } #$row[0..3];
which quotes strings with a number in them as well, like the example above.
For non-integer numbers, there is Scalar::Util::looks_like_number
use Scalar::Util 'looks_like_number';
my $data = join ',', map { looks_like_number($_) ? $_ : "'${_}'" } #$row[0..3];
what of course works for the second case (integers) as well.

how to get the required strings from a text using perl

Here is the text to trim:
/home/netgear/Desktop/WGET-1.13/wget-1.13/src/cmpt.c:388,error,resourceLeak,Resource leak: fr
From the above text I need to get the data next to ":". How do I get 388,error,resourceLeak,Resource leak: fr?
You can use split to separate a string into a list based on a delimiter. In your case the delimiter should be a ::
my #parts = split ':', $text;
As the text you want to extract can also contain a :, use the limit argument to stop after the first one:
my #parts = split ':', $text, 2;
$parts[1] will then contain the text you wanted to extract. You could also pass the result into a list, discarding the first element:
my (undef, $extract) = split ':', $text, 2;
Aside from #RobEarl's suggestion of using split, you could use a regular expression to do this.
my ($match) = $text =~ /^[^:]+:(.*?)$/;
Regular expression:
^ the beginning of the string
[^:]+ any character except: ':' (1 or more times)
: match ':'
( group and capture to \1:
.*? any character except \n (0 or more times)
) end of \1
$ before an optional \n, and the end of the string
$match will now hold the result of capture group #1..
388,error,resourceLeak,Resource leak: fr