How do I parse a pseudo float number in Xtext grammar? - eclipse

I need to filter a 'reference number' of the form XX.XX, where X is any upper or lower-case letter or number (0-9). This is what I have came up with:
SCR_REF:
'Scr_Ref' ':' value=PROFILE
;
terminal PROFILE :
((CHAR|INT)(CHAR|INT)'.'(CHAR|INT)(CHAR|INT))
;
terminal CHAR returns ecore::EString : ('a'..'z'|'A'..'Z');
But his doesn't work in the generated editor. The following test entry:
Scr_Ref: 11.22
throws an error saying:
"no viable alternative at character '.' "
What I'm I doing wrong?

I think your problem is that you are using default INT in here. Both 11 and 22 is an integer by themselves. You need digits in here not Integer. Down here I made an example for you.
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
import"http://www.eclipse.org/emf/2002/Ecore" as ecore
Model:
greetings+=Greeting*;
Greeting:
'Hello' name=ID '!' "val=" val= PROFILE;
terminal PROFILE :
((CHAR|DIGIT)(CHAR|DIGIT)'.'(CHAR|DIGIT)(CHAR|DIGIT))
;
terminal DIGIT:
('0'..'9')
;
terminal CHAR returns ecore::EString :
('a'..'z'|'A'..'Z')
;
Hope this helps.

Related

Include commentary to the matlab grammar using antlr4

could anybody help me with these two problems please?
First one is almost solved for me by question regular expression for multiline commentary in matlab , but I do not know how exactly I should use ^.*%\{(?:\R(?!.*%\{).*)*\R\h*%\}$ or where in grammar if I want use is with antlr4. I have been using matlab grammar from this source.
Second one is related to another type of commentary in matlab which is a = 3 % type any ascii I want.... In this case worked, when I insert label alternative to the rule context unary_expression in this form:
unary_expression
: postfix_expression
| unary_operator postfix_expression
| postfix_expression COMMENT
;
where COMMENT: '%' [ a-zA-Z0-9]*;, but when I use [\x00-\x7F] instead of [ a-zA-Z0-9]* (what I found here) parsing goes wrong, see example bellow:
INPUT FOR PARSER: a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
ANTLR OUTPUT : Exception in thread "main" java.lang.RuntimeException: set is empty
at org.antlr.v4.runtime.misc.IntervalSet.getMaxElement(IntervalSet.java:421)
at org.antlr.v4.runtime.atn.ATNSerializer.serialize(ATNSerializer.java:169)
at org.antlr.v4.runtime.atn.ATNSerializer.getSerialized(ATNSerializer.java:601)
at org.antlr.v4.Tool.generateInterpreterData(Tool.java:745)
at org.antlr.v4.Tool.processNonCombinedGrammar(Tool.java:400)
at org.antlr.v4.Tool.process(Tool.java:361)
at org.antlr.v4.Tool.processGrammarsOnCommandLine(Tool.java:328)
at org.antlr.v4.Tool.main(Tool.java:172)
line 1:9 token recognition error at: '$'
line 1:20 token recognition error at: '"'
line 1:21 token recognition error at: '!'
line 1:22 token recognition error at: '"'
line 1:38 token recognition error at: '$'
line 1:43 token recognition error at: '"'
line 1:10 missing {',', ';', CR} at 'L'
line 1:32 missing {',', ';', CR} at '3'
Can anybody please tell me what have I done wrong? And what is the best practice for this problem? (I am not exactly regex person...)
Let's take the simple one first.
this looks (to me) like a typical "comment everything through the end of the line" comment.
Assuming I'm correct, then best not to consider what all the valid characters are that might be contained, but rather to think about what not to consume.
Try: COMMENT: '%' ~[\r\n]* '\r'? '\n';
(I notice that you did not include anything in your rule to terminate it at the end of the line, so I've added that).
This basically says: once I see a % consume everything that is not a \r or `nand stop when you see an option\rfollowed by a required\n'.
Generally, comments can occur just about anywhere within a grammar structure, so it's VERY useful to "shove the off to the side" rather than inject them everywhere you allow them in the grammar.
So, a short grammar:
grammar test
;
test: ID EQ INT;
EQ: '=';
INT: [0-9]+;
COMMENT: '%' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN);
ID: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
You'll notice that I removed the COMMENT element from the test rule.
test file:
a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
(be sure to include the \n)
➜ grun test test -tree -tokens < test.txt
[#0,0:0='a',<ID>,1:0]
[#1,2:2='=',<'='>,1:2]
[#2,4:4='3',<INT>,1:4]
[#3,6:48='% $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa\n',<COMMENT>,channel=1,1:6]
[#4,49:48='<EOF>',<EOF>,2:0]
(test a = 3)
You still get a COMMENT token, it's just ignored when matching the parser rules.
Now for the multiline comments:
ANTLR uses a rather "regex-like" syntax for Lexer rules, but, don't be fooled, it's not (it's actually more powerful as it can pair up nested brackets, etc.)
From a quick reading, MatLab multiline tokens start with a %{ and consume everything until a %}. This is very similar to the prior rule, it just doesn't care about \ror\n`), so:
MLCOMMENT: '%{' .*? '%}' -> channel(HIDDEN);
Included in grammar:
grammar test
;
test: ID EQ INT;
EQ: '=';
INT: [0-9]+;
COMMENT: '%' ~[\r\n]* '\r'? '\n' -> channel(HIDDEN);
MLCOMMENT: '%{' .*? '%}' -> channel(HIDDEN);
ID: [a-zA-Z]+;
WS: [ \t\r\n]+ -> skip;
Input file:
a = 3 % $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa
%{
A whole bunch of stuff
on several
lines
%}
➜ grun test test -tree -tokens < test.txt
[#0,0:0='a',<ID>,1:0]
[#1,2:2='=',<'='>,1:2]
[#2,4:4='3',<INT>,1:4]
[#3,6:48='% $£ K JFKL£J"!"OIJ+2432 3K3KJ£$K M£"Kdsa\n',<COMMENT>,channel=1,1:6]
[#4,50:106='%{\n A whole bunch of stuff\n on several\n lines\n%}',<MLCOMMENT>,channel=1,3:0]
[#5,108:107='<EOF>',<EOF>,8:0]
(test a = 3)

How to control the number of decimals places for display purpose only using Image function in Ada?

I have the following code line in Ada,
Put_Line ("Array of " & Integer'Image (iterations)
& " is " & Long_Float'Image (sum)
& " Time = " & Duration'Image(milliS) & timescale);
The number of decimal places in sum is too long for display (not for calculations since long float is needed for sum calculations). I know that Ada has alternative way of displaying decimals using aft and fore without using the Image function but before I switch to alternative I would like to know if Image has options or other technique of displaying decimals. Does Image function has an option to display decimals? Is there a technique to shorten the number of decimal places of the Long_Float for display only?
with Ada.Numerics;
with Ada.Text_IO; use Ada.Text_IO;
procedure Images is
sum : Standard.Long_Float;
Pi : Long_Float := Ada.Numerics.Pi;
type Fixed is delta 0.001 range -1.0e6 .. 1.0e6;
type NewFixed is range -(2 ** 31) .. +(2 ** 31 - 1);
type Fixed2 is new Long_Float range -1.0e99.. 1.0e99;
type Fixed3 is new Long_Float range -(2.0e99) .. +(2.0e99);
begin
sum:=4.99999950000e14;
Put_Line ("no fixing number: " & Pi'Image);
Put_Line (" fixed number: " & Fixed'Image(Fixed (Pi)));
Put_Line ("no fixing number: " & Long_Float'Image(sum));
Put_Line (" testing fix: " & Fixed3'Image(Fixed3 (sum)));
end Images;
Addendum:
Note that my variable sum is defined as Standard.Long_Float to agree with other variables used throughout the program.
I am adding code to show the culprit of my problem and my attempts at solving the problem. It is based on example provided by Simon Wright with sum number added by me. Looks like all I need to figure out how to insert delta into Fixed3 type since delta defines number of decimals.
’Image doesn’t have any options, see ARM2012 3.5(35) (also (55.4)).
However, Ada 202x ARM K.2(88) and 4.10(13) suggest an alternative:
with Ada.Numerics;
with Ada.Text_IO; use Ada.Text_IO;
procedure Images is
Pi : Long_Float := Ada.Numerics.Pi;
type Fixed is delta 0.001 range -1.0e6 .. 1.0e6;
begin
Put_Line (Pi'Image);
Put_Line (Fixed (Pi)'Image);
end Images;
which reports (GNAT CE 2020, FSF GCC 10.1.0)
$ ./images
3.14159265358979E+00
3.142
The Image attribute has the same parameters for all the types, so format cannot be specified. There are generic nested packages in Ada.Text_IO for handling I/O of numeric types. For your case you can instantiate Ada.Text_IO.Float_IO for Float or just use the built-in Ada.Float_Text_IO instance. You can then use the Put procedure to specify the format. There is an example in the Ada Standard:
package Real_IO is new Float_IO(Real); use Real_IO;
-- default format used at instantiation, Default_Exp = 3
X : Real := -123.4567; -- digits 8 (see 3.5.7)
Put(X); -- default format "–​1.2345670E+02"
Put(X, Fore => 5, Aft => 3, Exp => 2); -- "bbb–​1.235E+2"
Put(X, 5, 3, 0); -- "b–​123.457"
Long_Float'Image (sum) is just a regular string in the form "snnnnn.ddddd". Note that character 's' representing sign, character 'n' representing number and character 'd' representing decimal.
Thus, if you want to drop the last three decimals, just use the following
Long_Float'Image(sum)(Long_Float'Image(sum)'First .. Long_Float'Image(sum)'Last - 3);
As others have pointed out, the 'Image attribute function doesn't provide for any control over the format. However, it is certainly possible to write a function that does. See PragmARC.Images.Float_Image for a generic example.

How to recognize ID, Literals and Comments in Lex file

I have to write a lex program that has these rules:
Identifiers: String of alphanumeric (and _), starting with an alphabetic character
Literals: Integers and strings
Comments: Start with ! character, go to until the end of the line
Here is what I came up with
[a-zA-Z][a-zA-Z0-9]+ return(ID);
[+-]?[0-9]+ return(INTEGER);
[a-zA-Z]+ return ( STRING);
!.*\n return ( COMMENT );
However, I still get a lot of errors when I compile this lex file.
What do you think the error is?
It would have helped if you'd shown more clearly what the problem was with your code. For example, did you get an error message or did it not function as desired?
There are a couple of problems with your code, but it is mainly correct. The first issue I see is that you have not divided your lex program into the necessary parts with the %% divider. The first part of a lex program is the declarations section, where regular expression patterns are specified. The second part is where the action that match patterns are specified. The (optional) third section is where any code (for the compiler) is placed. Code for the compiler can also be placed in the declaration section when delineated by %{ and %} at the start of a line.
If we put your code through lex we would get this error:
"SoNov16.l", line 1: bad character: [
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: ]
"SoNov16.l", line 1: bad character: +
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: (
"SoNov16.l", line 1: unknown error processing section 1
"SoNov16.l", line 1: bad character: )
"SoNov16.l", line 1: bad character: ;
Did you get something like that? In your example code you are specifying actions (the return(ID); is an example of an action) and thus your code is for the second section. You therefore need to put a %% line ahead of it. It will then be a valid lex program.
You code is dependant on (probably) a parser, which consumes (and declares) the tokens. For testing purposes it is often easier to just print the tokens first. I solved this problem by making a C macro which will do the print and can be redefined to do the return at a later stage. Something like this:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
[a-zA-Z][a-zA-Z0-9]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
[a-zA-Z]+ TOKEN (STRING);
!.*\n TOKEN (COMMENT);
If we build and test this, we get the following:
abc
String: abc Matched: ID
abc123
String: abc123 Matched: ID
! comment text
String: ! comment text
Matched: COMMENT
Not quite correct. We can see that the ID rule is matching what should be a string. This is due to the ordering of the rules. We have to put the String rule first to ensure it matches first - unless of course you were supposed to match strings inside some quotes? You also missed the underline from the ID pattern. Its also a good idea to match and discard any whitespace characters:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
[a-zA-Z]+ TOKEN (STRING);
[a-zA-Z][a-zA-Z0-9_]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
!.*\n TOKEN (COMMENT);
[ \t\r\n]+ ;
Which when tested shows:
abc
String: abc Matched: STRING
abc123_
String: abc123_ Matched: ID
-1234
String: -1234 Matched: INTEGER
abc abc123 ! comment text
String: abc Matched: STRING
String: abc123 Matched: ID
String: ! comment text
Matched: COMMENT
Just in case you wanted strings in quotes, that is easy too:
%{
#define TOKEN(t) printf("String: %s Matched: " #t "\n",yytext)
%}
%%
\"[^"]+\" TOKEN (STRING);
[a-zA-Z][a-zA-Z0-9_]+ TOKEN(ID);
[+-]?[0-9]+ TOKEN(INTEGER);
!.*\n TOKEN (COMMENT );
[ \t\r\n] ;
"abc"
String: "abc" Matched: STRING

xText Variable/Attribute Assignment

I built a grammar in xText to recognize formal expressions of a specific format
and to use the generated object tree in Java.
This is what it looks like:
grammar eu.gemtec.device.espa.texpr.Texpr with org.eclipse.xtext.common.Terminals
generate texpr "http://www.gemtec.eu/device/espa/texpr/Texpr"
Model:
(expressions+=AbstractExpression)*
;
AbstractExpression:
MatcherExpression | Assignment;
MatcherExpression:
TerminalMatcher ({Operation.left=current} operator='or' right= MatcherExpression)?
;
TerminalMatcher returns MatcherExpression:
'(' MatcherExpression ')' | {MatcherLiteral} value=Literal
;
Literal:
CharMatcher | ExactMatcher
;
CharMatcher:
type=('text'|'number'|'symbol'|'whitespace') ('(' cardinality=Cardinality ')')?
;
/* Kardinalitäten für CharMatcher*/
Cardinality:
CardinalityMin | CardinalityMinMax | CardinalityMax| CardinalityExact
;
CardinalityMin: min=INT '->';
CardinalityMinMax: min=INT '->' max=INT;
CardinalityMax: '->' max=INT;
CardinalityExact: exact=INT;
ExactMatcher:
(ignoreCase='ignoreCase''(' expected=STRING ')') | expected=STRING
;
/* Variablenzuweisung
*
* z.B. $myVar=number
* */
Assignment:
'$' name=ID '=' expression=MatcherExpression
;
Everything works fine except for the 'cardinality' assignment.
The Expressions look like this:
text number(3) - (an arbitrary amount of letters followed by exactly 3 numbers)
symbol number(2->) - (an arbitrary amount of special characters followed by at least 2 numbers)
whitespace number(->4) - (an arbitrary amount of whitespaces followed by a maximum of 4 numbers)
number(3->6) - (at least 3 numbers but not more than 6)
When I run Eclipse with this grammar (so that my language is recognized and has code completion and so on), everything I type is shown in the "Outline"-tab as a tree-structure as it should, except for the cardinality values.
When I add a cardinality statement to a CharMatcher, the little plus appears before it, but when I click on it it just disappears.
Can anyone tell me why this does not work?
I found the solution myself, I think the problem was that the compiler could not decide which class to create at this point:
Cardinality:
CardinalityMin | CardinalityMinMax | CardinalityMax| CardinalityExact
;
CardinalityMin: min=INT '->';
CardinalityMinMax: min=INT '->' max=INT;
CardinalityMax: '->' max=INT;
CardinalityExact: exact=INT;
So I simplified the whole thing a little, it now looks like this:
Cardinality:
CardinalityMinMax | CardinalityExact
;
CardinalityMinMax: (min=INT '..' max=INT) | (min=INT '..') | ('..' max=INT);
CardinalityExact: exact=INT;
It is still not shown in the "Outline"-Tab, but I suppose that is a problem of the visualisation.
The generated classes now work as intended.

Lisp grammar in yacc

I am trying to build a Lisp grammar. Easy, right? Apparently not.
I present these inputs and receive errors...
( 1 1)
23 23 23
ui ui
This is the grammar...
%%
sexpr: atom {printf("matched sexpr\n");}
| list
;
list: '(' members ')' {printf("matched list\n");}
| '('')' {printf("matched empty list\n");}
;
members: sexpr {printf("members 1\n");}
| sexpr members {printf("members 2\n");}
;
atom: ID {printf("ID\n");}
| NUM {printf("NUM\n");}
| STR {printf("STR\n");}
;
%%
As near as I can tell, I need a single non-terminal defined as a program, upon which the whole parse tree can hang. But I tried it and it didn't seem to work.
edit - this was my "top terminal" approach:
program: slist;
slist: slist sexpr | sexpr;
But it allows problems such as:
( 1 1
Edit2: The FLEX code is...
%{
#include <stdio.h>
#include "a.yacc.tab.h"
int linenumber;
extern int yylval;
%}
%%
\n { linenumber++; }
[0-9]+ { yylval = atoi(yytext); return NUM; }
\"[^\"\n]*\" { return STR; }
[a-zA-Z][a-zA-Z0-9]* { return ID; }
.
%%
An example of the over-matching...
(1 1 1)
NUM
matched sexpr
NUM
matched sexpr
NUM
matched sexpr
(1 1
NUM
matched sexpr
NUM
matched sexpr
What's the error here?
edit: The error was in the lexer.
Lisp grammar can not be represented as context-free grammar, and yacc can not parse all lisp code.
It is because of lisp features such as read-evaluation and programmable reader. So, in order just to read an arbitrary lisp code, you need to have a full lisp running. This is not some obscure, non-used feature, but it is actually used. E.g., CL-INTERPOL, CL-SQL.
If the goal is to parse a subset of lisp, then the program text is a sequence of sexprs.
The error is really in the lexer. Your parentheses end up as the last "." in the lexer, and don't show up as parentheses in the parser.
Add rules like
\) { return RPAREN; }
\( { return LPAREN; }
to the lexer and change all occurences of '(', ')' to LPAREN and RPAREN respectively in the parser. (also, you need to #define LPAREN and RPAREN where you define your token list)
Note: I'm not sure about the syntax, could be the backslashes are wrong.
You are correct in that you need to define a non-terminal. That would be defined as a set of sexpr. I'm not sure of the YACC syntax for that. I'm partial to ANTLR for parser generators and the syntax would be:
program: sexpr*
Indicating 0 or more sexpr.
Update with YACC syntax:
program : /* empty */
| program sexpr
;
Not in YACC, but might be helpful anyway, here's a full grammar in ANTLR v3 that works for the cases you described(excludes strings in the lexer because it's not important for this example, also uses C# console output because that's what I tested it with):
program: (sexpr)*;
sexpr: list
| atom {Console.WriteLine("matched sexpr");}
;
list:
'('')' {Console.WriteLine("matched empty list");}
| '(' members ')' {Console.WriteLine("matched list");}
;
members: (sexpr)+ {Console.WriteLine("members 1");};
atom: Id {Console.WriteLine("ID");}
| Num {Console.WriteLine("NUM");}
;
Num: ( '0' .. '9')+;
Id: ('a' .. 'z' | 'A' .. 'Z')+;
Whitespace : ( ' ' | '\r' '\n' | '\n' | '\t' ) {Skip();};
This won't work exactly as is in YACC because YACC generates and LALR parser while ANTLR is a modified recursive descent. There is a C/C++ output target for ANTLR if you wanted to go that way.
Do you neccesarily need a yacc/bison parser? A "reads a subset of lisp syntax" reader isn't that hard to implement in C (start with a read_sexpr function, dispatch to a read_list when you see a '(', that in turn builds a list of contained sexprs until a ')' is seen; otherwise, call a read_atom that collects an atom and returns it when it can no longer read atom-constituent characters).
However, if you want to be able to read arbritary Common Lisp, you'll need to (at the worst) implement a Common Lisp, as CL can modify the reader run-time (and even switch between different read-tables run-time under program control; quite handy when you're wanting to load code written in another language or dialect of lisp).
It's been a long time since I worked with YACC, but you do need a top-level non-terminal. Could you be more specific about "tried it" and "it didn't seem to work"? Or, for that matter, what the errors are?
I'd also suspect that YACC might be overkill for such a syntax-light language. Something simpler (like recursive descent) might work better.
You could try this grammar here.
I just tried it, my "yacc lisp grammar" works fine :
%start exprs
exprs:
| exprs expr
/// if you prefer right recursion :
/// | expr exprs
;
list:
'(' exprs ')'
;
expr:
atom
| list
;
atom:
IDENTIFIER
| CONSTANT
| NIL
| '+'
| '-'
| '*'
| '^'
| '/'
;