XText cross referencing - eclipse

I have written following grammar
Model:
package = PackageDec?
greetings+=Greeting*
usage+=Usage* ;
PackageDec:
'package' name=QualifiedName ;
Greeting:
'greet' name=ID '{' ops += Operation* '}' ;
Operation:
'op' name=ID ('(' ')' '{' '}')? ;
QualifiedName:
ID ('.' ID)*;
Usage:
'use';
With above i can write following script.
package p1.p2
greet G1 {op f1 op f2 }
Now i need to write something like this:
package p1.p2
greet G1 {op f1 op f2 op f3}
use p1.p2.G1.f1
use p1.p2.G1
use p1.p2.G1.f3
To support that i changed Usage RULE like this
Usage:
'use' head=[Greet|QualifiedName] =>('.' tail=[Operation])?
However when i generate xtext artifacts it is complaining about multiple alternatives.
Please let me know how to write correct grammar rule for this.

This is because QualifiedName consumes dots (.). Adding ('.' ...)? makes two alternatives. Consider input
a.b.c
This could be parsed as
head="a" tail = "b.c"
head="a.b" tail = "c"
If I understand your intention of using predicate => right, than you just have to replace
head=[Greet|QualifiedName]
with
head=[Greet]
In this case however you will not be able to parse references with dots.
As a solution I would recommend to substitute your dot with some other character. For example with colon:
Usage:
'use' head=[Greet|QualifiedName] (':' tail=[Operation])?

Related

Include commentary to the matlab grammar using antlr4

could anybody help me with these two problems please?
First one is almost solved for me by question regular expression for multiline commentary in matlab , but I do not know how exactly I should use ^.*%\{(?:\R(?!.*%\{).*)*\R\h*%\}$ or where in grammar if I want use is with antlr4. I have been using matlab grammar from this source.
Second one is related to another type of commentary in matlab which is a = 3 % type any ascii I want.... In this case worked, when I insert label alternative to the rule context unary_expression in this form:
unary_expression
: postfix_expression
| unary_operator postfix_expression
| postfix_expression COMMENT
;
where COMMENT: '%' [ a-zA-Z0-9]*;, but when I use [\x00-\x7F] instead of [ a-zA-Z0-9]* (what I found here) parsing goes wrong, see example bellow:
INPUT FOR PARSER: a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
ANTLR OUTPUT : Exception in thread "main" java.lang.RuntimeException: set is empty
at org.antlr.v4.runtime.misc.IntervalSet.getMaxElement(IntervalSet.java:421)
at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:169)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:601)
at org.antlr.v4.Tool.generateInterpreterData(Tool.java:745)
at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:400)
at org.antlr.v4.Tool.process(Tool.java:361)
at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:328)
at org.antlr.v4.Tool.main(Tool.java:172)
line 1:9 token recognition error at: '$'
line 1:20 token recognition error at: '"'
line 1:21 token recognition error at: '!'
line 1:22 token recognition error at: '"'
line 1:38 token recognition error at: '$'
line 1:43 token recognition error at: '"'
line 1:10 missing {',', ';', CR} at 'L'
line 1:32 missing {',', ';', CR} at '3'
Can anybody please tell me what have I done wrong? And what is the best practice for this problem? (I am not exactly regex person...)
Let's take the simple one first.
this looks (to me) like a typical "comment everything through the end of the line" comment.
Assuming I'm correct, then best not to consider what all the valid characters are that might be contained, but rather to think about what not to consume.
Try: COMMENT: '%' ~[\r\n]* '\r'? '\n';
(I notice that you did not include anything in your rule to terminate it at the end of the line, so I've added that).
This basically says: once I see a % consume everything that is not a \r or `nand stop when you see an option\rfollowed by a required\n'.
Generally, comments can occur just about anywhere within a grammar structure, so it's VERY useful to "shove the off to the side" rather than inject them everywhere you allow them in the grammar.
So, a short grammar:
grammar test
;
test: ID EQ INT;
EQ: '=';
INT: [0-9]+;
COMMENT: '%' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN);
ID: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
You'll notice that I removed the COMMENT element from the test rule.
test file:
a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
(be sure to include the \n)
➜ grun test test -tree -tokens < test.txt
[#0,0:0='a',<ID>,1:0]
[#1,2:2='=',<'='>,1:2]
[#2,4:4='3',<INT>,1:4]
[#3,6:48='% $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa\n',<COMMENT>,channel=1,1:6]
[#4,49:48='<EOF>',<EOF>,2:0]
(test a = 3)
You still get a COMMENT token, it's just ignored when matching the parser rules.
Now for the multiline comments:
ANTLR uses a rather "regex-like" syntax for Lexer rules, but, don't be fooled, it's not (it's actually more powerful as it can pair up nested brackets, etc.)
From a quick reading, MatLab multiline tokens start with a %{ and consume everything until a %}. This is very similar to the prior rule, it just doesn't care about \ror\n`), so:
MLCOMMENT: '%{' .*? '%}' -> channel(HIDDEN);
Included in grammar:
grammar test
;
test: ID EQ INT;
EQ: '=';
INT: [0-9]+;
COMMENT: '%' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN);
MLCOMMENT: '%{' .*? '%}' -> channel(HIDDEN);
ID: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
Input file:
a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
%{
A whole bunch of stuff
on several
lines
%}
➜ grun test test -tree -tokens < test.txt
[#0,0:0='a',<ID>,1:0]
[#1,2:2='=',<'='>,1:2]
[#2,4:4='3',<INT>,1:4]
[#3,6:48='% $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa\n',<COMMENT>,channel=1,1:6]
[#4,50:106='%{\n A whole bunch of stuff\n on several\n lines\n%}',<MLCOMMENT>,channel=1,3:0]
[#5,108:107='<EOF>',<EOF>,8:0]
(test a = 3)

SystemVerilog stringify (`") operator and line breaks

I'm using the SystemVerilog stringify operator, `", in a macro, as below. The case is deliberately contrived to show the bug:
module my_test();
`define print(x) $fwrite(log_file, `"x`")
`define println(x) $fwrite(log_file, `"x\n`")
integer log_file;
initial begin
log_file = $fopen("result.txt", "w");
`print(A);
`print(B);
`println(C);
`println(D);
`print(E);
`print(F);
end
endmodule
This gives the output (no trailing newline):
ABC
`D
`EF
Why are there `s in the output, but only from the println?
Is this documented behaviour in the spec, or a bug in my simulator (Aldec Active-HDL)?
This is a bug in your tool. However, the second `" is not needed and gives you the results you are looking for.

Marpa parser can't seem to cope with optional first symbol?

I've been getting to grips with the Marpa parser and encountered a problem when the first symbol is optional. Here's an example:
use strict;
use warnings;
use 5.10.0;
use Marpa::R2;
use Data::Dump;
my $grammar = Marpa::R2::Scanless::G->new({source => \<<'END_OF_GRAMMAR'});
:start ::= Rule
Rule ::= <optional a> 'X'
<optional a> ~ a *
a ~ 'a'
END_OF_GRAMMAR
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar});
dd $recce->read(\"X");
When I run this, I get the following error:
Error in SLIF parse: No lexemes accepted at line 1, column 1
* String before error:
* The error was at line 1, column 1, and at character 0x0058 'X', ...
* here: X
Marpa::R2 exception at small.pl line 20
at /usr/local/lib/perl/5.14.2/Marpa/R2.pm line 126
Marpa::R2::exception('Error in SLIF parse: No lexemes accepted at line 1, column 1\x{a}...') called at /usr/local/lib/perl/5.14.2/Marpa/R2/Scanless.pm line 1545
Marpa::R2::Scanless::R::read_problem('Marpa::R2::Scanless::R=ARRAY(0x95cbfd0)', 'no lexemes accepted') called at /usr/local/lib/perl/5.14.2/Marpa/R2/Scanless.pm line 1345
Marpa::R2::Scanless::R::resume('Marpa::R2::Scanless::R=ARRAY(0x95cbfd0)', 0, -1) called at /usr/local/lib/perl/5.14.2/Marpa/R2/Scanless.pm line 926
Marpa::R2::Scanless::R::read('Marpa::R2::Scanless::R=ARRAY(0x95cbfd0)', 'SCALAR(0x95aeb1c)') called at small.pl line 20
Perl version 5.14.2 (debian wheezy)
Marpa version 2.068000
(I see there's a brand new Marpa 2.069 that I haven't tried yet)
Is this something I'm doing wrong in my grammar?
In Marpa Scanless, your grammar has two levels: The main, high-level grammar where you can attribute actions and such, and the low-level lexing grammar. They are executed independently (which is expected if you have used traditional parser/lexers, but is very confusing when you come from regexes to Marpa).
Now on the low level grammar, Marpa recognizes your input as a single X, not “zero as and then an X”. However, the high-level grammar requires the optional a symbol to be present.
There best way around that is to make the a optional in the high-level grammar:
<optional a> ::= <many a>
<optional a> ::= # empty
<many a> ~ a* # would work the same here with "a+"
a ~ 'a'

Handling multiple return values in ANTLR

I have a simple rule in ANTLR:
title returns [ElementVector<Element> v]
#init{
$v = new ElementVector<Element>() ;
}
: '[]'
| '[' title_args {$v.add($title_args.ele);} (',' title_args {$v = $title_args.ele ;})* ']'
;
with title_args being:
title_args returns [Element ele]
: author {$ele = new Element("author", $author.text); }
| location {$ele = new Element("location", $location.text); }
;
Trying to compile that I get confronted with a 127 error in the title rule: title_args is a non-unique reference.
I've followed the solution given to another similar question in this website (How to deal with list return values in ANTLR) however it only seems to work with lexical rules.
Is there a specific way to go around it ?
Thank you,
Christos
You have 2 title_args in your expression, you need to alias them. Try this:
| '[' t1=title_args {$v.add($t1.ele);} (',' t2=title_args {$v = $t2.ele ;})* ']'
t1 and t2 are arbitrary aliases you can choose anything you want as long as they match up.
I think the problem is your reusing the title_args var. Try changing one of those variable names.
Yeah, I had the same problem.
You need to change one of the variable names; for example, do like the following:
title_args
title_args1
in your code instead of using title_args twice.
If title_args is a parser rule, then just create the same rule with the name title_args1.
So, basically there would be two rules with the same functionality.

Lisp grammar in yacc

I am trying to build a Lisp grammar. Easy, right? Apparently not.
I present these inputs and receive errors...
( 1 1)
23 23 23
ui ui
This is the grammar...
%%
sexpr: atom {printf("matched sexpr\n");}
| list
;
list: '(' members ')' {printf("matched list\n");}
| '('')' {printf("matched empty list\n");}
;
members: sexpr {printf("members 1\n");}
| sexpr members {printf("members 2\n");}
;
atom: ID {printf("ID\n");}
| NUM {printf("NUM\n");}
| STR {printf("STR\n");}
;
%%
As near as I can tell, I need a single non-terminal defined as a program, upon which the whole parse tree can hang. But I tried it and it didn't seem to work.
edit - this was my "top terminal" approach:
program: slist;
slist: slist sexpr | sexpr;
But it allows problems such as:
( 1 1
Edit2: The FLEX code is...
%{
#include <stdio.h>
#include "a.yacc.tab.h"
int linenumber;
extern int yylval;
%}
%%
\n { linenumber++; }
[0-9]+ { yylval = atoi(yytext); return NUM; }
\"[^\"\n]*\" { return STR; }
[a-zA-Z][a-zA-Z0-9]* { return ID; }
.
%%
An example of the over-matching...
(1 1 1)
NUM
matched sexpr
NUM
matched sexpr
NUM
matched sexpr
(1 1
NUM
matched sexpr
NUM
matched sexpr
What's the error here?
edit: The error was in the lexer.
Lisp grammar can not be represented as context-free grammar, and yacc can not parse all lisp code.
It is because of lisp features such as read-evaluation and programmable reader. So, in order just to read an arbitrary lisp code, you need to have a full lisp running. This is not some obscure, non-used feature, but it is actually used. E.g., CL-INTERPOL, CL-SQL.
If the goal is to parse a subset of lisp, then the program text is a sequence of sexprs.
The error is really in the lexer. Your parentheses end up as the last "." in the lexer, and don't show up as parentheses in the parser.
Add rules like
\) { return RPAREN; }
\( { return LPAREN; }
to the lexer and change all occurences of '(', ')' to LPAREN and RPAREN respectively in the parser. (also, you need to #define LPAREN and RPAREN where you define your token list)
Note: I'm not sure about the syntax, could be the backslashes are wrong.
You are correct in that you need to define a non-terminal. That would be defined as a set of sexpr. I'm not sure of the YACC syntax for that. I'm partial to ANTLR for parser generators and the syntax would be:
program: sexpr*
Indicating 0 or more sexpr.
Update with YACC syntax:
program : /* empty */
| program sexpr
;
Not in YACC, but might be helpful anyway, here's a full grammar in ANTLR v3 that works for the cases you described(excludes strings in the lexer because it's not important for this example, also uses C# console output because that's what I tested it with):
program: (sexpr)*;
sexpr: list
| atom {Console.WriteLine("matched sexpr");}
;
list:
'('')' {Console.WriteLine("matched empty list");}
| '(' members ')' {Console.WriteLine("matched list");}
;
members: (sexpr)+ {Console.WriteLine("members 1");};
atom: Id {Console.WriteLine("ID");}
| Num {Console.WriteLine("NUM");}
;
Num: ( '0' .. '9')+;
Id: ('a' .. 'z' | 'A' .. 'Z')+;
Whitespace : ( ' ' | '\r' '\n' | '\n' | '\t' ) {Skip();};
This won't work exactly as is in YACC because YACC generates and LALR parser while ANTLR is a modified recursive descent. There is a C/C++ output target for ANTLR if you wanted to go that way.
Do you neccesarily need a yacc/bison parser? A "reads a subset of lisp syntax" reader isn't that hard to implement in C (start with a read_sexpr function, dispatch to a read_list when you see a '(', that in turn builds a list of contained sexprs until a ')' is seen; otherwise, call a read_atom that collects an atom and returns it when it can no longer read atom-constituent characters).
However, if you want to be able to read arbritary Common Lisp, you'll need to (at the worst) implement a Common Lisp, as CL can modify the reader run-time (and even switch between different read-tables run-time under program control; quite handy when you're wanting to load code written in another language or dialect of lisp).
It's been a long time since I worked with YACC, but you do need a top-level non-terminal. Could you be more specific about "tried it" and "it didn't seem to work"? Or, for that matter, what the errors are?
I'd also suspect that YACC might be overkill for such a syntax-light language. Something simpler (like recursive descent) might work better.
You could try this grammar here.
I just tried it, my "yacc lisp grammar" works fine :
%start exprs
exprs:
| exprs expr
/// if you prefer right recursion :
/// | expr exprs
;
list:
'(' exprs ')'
;
expr:
atom
| list
;
atom:
IDENTIFIER
| CONSTANT
| NIL
| '+'
| '-'
| '*'
| '^'
| '/'
;