Boolean in ANTLR4? - boolean

This is my code :
grammar test;
assignStatement : ID '=' BOOL ';' ;
ID : ID_LETTER (ID_LETTER | DIGIT)* ;
fragment ID_LETTER : [a-z] | [A-Z] | '_' ;
fragment DIGIT : [0-9] ;
BOOL : 'true' | 'false' ;
WS : [ \t\r\n]+ -> skip;
But when i test it with the sentence :
x = true ;
It has an error :
mismatched input 'true' expecting BOOL
Why did i have this error ? How to fix it ? Your help will be appreciated .

The Lexer will match the ID rule first because order matters. Therefore, move your BOOL rule above it to fix this:
assignStatement : ID '=' BOOL ';' ;
fragment ID_LETTER : [a-z] | [A-Z] | '_' ;
fragment DIGIT : [0-9] ;
BOOL : 'true' | 'false' ;
ID : ID_LETTER (ID_LETTER | DIGIT)* ;
WS : [ \t\r\n]+ -> skip;

Related

How to implement EOF in Antlr4 Grammar

I have a grammar that I am using for TSQL-esque language validation. Currently the grammar rules will work with a statement such as SUM(column1) + SUM(column2).
I would like the parser to throw an error if it was given something like
SUM(column1) SUM(column2). Notice the lack of an operator between the two SUMs. Right now, if I run this statement through the parser it does not error out. Instead it will return the first part of the statement, SUM(column1) and completely disregard the rest of the statement.
Upon research, I believe the answer to my problem is adding a EOF to my grammar. I have tried to implement this in several ways but it has not made any difference to the parsing.
This is the best way I can think to implement it in my grammar file, in the argument_list:
grammar DataAnalysis;
expression : literal #literalAtomExp
| FUNCTION=ID '(' argument_list ')' #functionExp
| INLINE_FUNCTION '(' argument_list ')' #inlineFunctionExp
| '(' expression ')' #parenthesisExp
| expression (ASTERISK|SLASH) expression #mulDivExp
| expression (PLUS|MINUS) expression #addSubExp
| <assoc=right> expression '^' expression #powerExp
| QUOTEDTEXT #stringExp
;
argument_list : expression (',' expression)* EOF //implemented here
;
literal : (TABLE_NAME=ID '.')? COLUMN_NAME=ID
| VALUE=NUMBER
;
fragment NAME : [a-zA-Z0-9_] ;
fragment LETTER : [a-zA-Z] ;
fragment DIGIT : [0-9] ;
ASTERISK : '*' ;
SLASH : '/' ;
PLUS : '+' ;
MINUS : '-' ;
INLINE_FUNCTION : 'YEAR'
| 'MONTH'
| 'DAY'
;
NUMBER : ('-')? DIGIT+ ('.' DIGIT+)? ;
ID : LETTER (NAME+) ;
QUOTEDTEXT : '"' .*? '"' ;
WHITESPACE : ' ' -> channel(HIDDEN);
Even like this the parsing doesn't pick up on the issue and returns only the first part of the query.
To summarize, when I feed the parser SUM(column1) SUM(column2) I would like it to return an error because it doesn't have any associated rule for that case.
Don't know what I am missing. Thanks for any direction.
Augment the grammar with "expr_prime : expression EOF;", and remove the EOF from the argument_list rule. Start parsing with expr_prime().

Antlr v4: 'mismatched input'

Basically, I'm trying to run this Pascal program through Antlr 4 in Powershell.
PROGRAM AddTwoNumbers;
VAR Num1, Num2, Sum : Integer;
BEGIN
Write('Input number 1:');
Readln(Num1);
Writeln('Input number 2:');
Readln(Num2);
Sum := Num1 + Num2;
Writeln(Sum);
Readln;
END.
However, I keep getting the following error in PowerShell:
line 8:4 mismatched input 'Writeln' expecting {'END', ';'}
Here are the relevant parts of my grammar file:
simpleStatement
: assignmentStatement
| procedureStatement
| exitStatement
| gotoStatement
| emptyStatement
| outputStatement
| readKey
;
outputStatement
: ( 'Writeln' | 'Write' ) LPAREN string RPAREN SEMI
input
;
input
: inputStatement
| readKey
;
inputStatement
: 'Readln' ( LPAREN identifier RPAREN )* SEMI
;
readKey
: 'Readkey' SEMI
;
How do I fix this error? Thanks.
Your compiles and runs perfectly in my machine. Have you tried compiling it using another compiler?

Antler matches a similar rule (but fails on the parts that differ)

I'm creating an Xtext plugin and for some reason, the following line incorrectly matches the StringStatement rule when it should match the UnstringStatement rule:
UNSTRING test2 DELIMITED BY " " INTO test2 END-UNSTRING
Here is my grammar:
Program:
(elements+=Elemental)*
(s+=Statement)*
;
Variable_Name:
varName=ID ("-" ID)*
;
Variable_Reference:
varRef=ID ("-" ID)*
;
Elemental:
'VAR' var=Variable_Name
;
Statement:
(us=UnstringStatement|s=StringStatement)
;
StringParam:
Variable_Reference | STRING
;
StringStatement:
'STRING' in=StringParam 'DELIMITED BY SIZE INTO' out=Variable_Reference 'END-STRING'
;
UnstringStatement:
'UNSTRING' in=StringParam 'DELIMITED BY' string2=STRING 'INTO' (outs+=Variable_Reference)* 'END-UNSTRING'
;
When I run the project as an Eclipse Application, the 'UNSTRING' token is highlighted (correctly), but the rest of the line has the error "Mismatched character '"' expecting 'S'." The 'S' that the error refers too, is from 'SIZE'.
Any idea why the two rules overlap like this?
EDIT, forgot the STRING rule:
terminal STRING :
'"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"' |
"'" ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|"'") )* "'"
;
EDIT 2:
After stepping through some of the Lexer code, I discovered that the token "DELIMITED BY" is incorrectly matched to "DELIMITED BY SIZE INTO", which then fails.
EDIT 3 FIXED:
I fixed this, but have no idea why it works. I just added a terminal DELIMITED_BY:
terminal DELIMITED_BY: 'DELIMITED BY'
StringStatement:
'STRING' in=StringParam DELIMITED_BY 'SIZE INTO' out=Variable_Reference 'END-STRING'
;
UnstringStatement:
'UNSTRING' in=StringParam DELIMITED_BY string2=STRING 'INTO' (outs+=Variable_Reference)* 'END-UNSTRING'
;
The STRING Token looks too greedy. In ANTLR the expression should be
terminal STRING :
'"' ( '\\' . | !('\\'|'"') )*? '"' |
"'" ( '\\' . | !('\\'|"'") )*? "'"
;

antlr4 - tsql parser: ignore not known statements

i am new here at SO. I want to program a t-sql (sybase) parser which only listen at some relevant sql-statements.
Is it possible, to ignore non relevant statements, without to write the complete t-sql syntax in the grammar file. So that no errors like "line 8:2 mismatched input 'INSERT' expecting {EXEC, BEGIN, END, IF}" are coming.
My Input is the following sql stored procedure (only example;)):
CREATE PROCEDURE mySQL (#BaseLoglevel INT,
#ReleaseId INT,
#TargetSystem VARCHAR (5),
#IgnoreTimeStamp INT)
AS
BEGIN
INSERT INTO Departments
(DepartmentID, DepartmentName, DepartmentHeadID)
VALUES (600, 'Eastern Sales', 501)
EXEC DoThis
BEGIN
EXEC DoSQLProc
END
if (#x=0)
begin
exec DoSQL
end
else begin
exec ReadTables
end
exec DoThat
exec DoOther
END
So, in my grammar file is nothing which describes the insert statement. So i want to ignore this unknown stuff. Is it possible?
This is my grammar-file:
grammar Tsql;
/************Parser Rules*******************/
file : createProcedure sqlBlock;
createProcedure: CREATE PROC ID paramList? AS;
//Params of create procedure
paramList: LPAREN (sqlParam)(COMMA sqlParam)* RPAREN;
sqlParam: AT_SIGN ID sqlType; //(EQ defaultValue)?;
sqlType: (VARCHARTYPE | NUMERICTYPE | INTTYPE | CHARTYPE) length?;
length: LPAREN INT RPAREN;
sqlBlock : BEGIN sql* END;
sql: sqlBlock
| sqlIf
| sqlExec
;
sqlExec: EXEC ID (LPAREN sqlExprList? RPAREN)* ; //SQLCall
//IF-rule
sqlIf: IF LPAREN sqlexpr RPAREN sqlIfBlock (sqlElseBlock)?;
sqlElseBlock: ELSE BEGIN sql* END;
sqlIfBlock: BEGIN sql* END;
/* T-SQL expressions */
sqlexpr
: ID LPAREN sqlExprList? RPAREN # K
| AT_SIGN ID # SQLVar
| LPAREN sqlexpr RPAREN # SQLParens
| sqlexpr EQ INT # SQLEqual
| sqlexpr NOT_EQ INT # SQLNotEqual
| sqlexpr LTH sqlexpr # SQLLessThan
| sqlexpr GTH sqlexpr # SQLGreaterThan
| sqlexpr LEQ sqlexpr # SQLLessEqual
| sqlexpr GEQ sqlexpr # SQLGreaterEqual
| sqlexpr (PLUS|MINUS) # SQLAddSub
| sqlexpr (MUTLIPLY|DIVIDE) # SQLMultDiv
| LPAREN sqlexpr RPAREN # SQLParens
| NOT sqlexpr # SQLNot
;
sqlExprList : sqlexpr (',' sqlexpr)* ; // arg list
/************Lexer Rules*******************/
//createProcedure
CREATE : 'CREATE' | 'create';
PROC : 'PROCEDURE' | 'procedure';
AS : 'AS'|'as';
EXEC : ('EXEC'|'exec');
//SqlTypes
INTTYPE: 'int'|'INT';
VARCHARTYPE : 'varchar'|'VARCHAR';
NUMERICTYPE : 'numeric'|'NUMERIC';
CHARTYPE : 'char'| 'CHAR';
//SqlBlock
BEGIN: 'BEGIN' | 'begin';
END: 'END' | 'end';
//If
IF: 'IF' | 'if';
ELSE : 'ELSE' | 'else';
RETURN : ('RETURN' | 'return');
DECLARE : ('DECLARE'|'declare');
AT_SIGN : '#';
ID : LETTER (LETTER | [0-9])* ;
APOSTROPH : [\'];
QUOTE : ["];
LPAREN : '(';
RPAREN : ')';
COMMA : ',';
SEMICOLON : ';';
DOT : '.';
EQ : '=';
NOT_EQ : ('!='|'<>');
LTH : ('<');
GTH : ('>');
LEQ : ('<=');
GEQ : ('=>');
RBRACK : ']';
LBRACK : '[';
PLUS : '+';
MINUS : '-';
MUTLIPLY : '*';
DIVIDE : '/';
COLON : ':';
NOT : ('NOT' | '!');
INT : [0-9]+ ;
ML_COMMENT
: '/*'.*? '*/' -> skip
;
SL_COMMENT
: '//' .*? '\n' -> skip
;
WS : [ \t\r\n]+ -> skip;
fragment
LETTER : [a-zA-Z] ;
Many thanks in advance.
Is it possible, to ignore non relevant statements, without to write the complete t-sql syntax in the grammar file.
You could do something like this:
file
: unit* EOF
;
unit
: my_interesting_statement
| . // any token
;
my_interesting_statement
: createProcedure sqlBlock
| // other statements here?
;
// parser rules
// lexer rules
// Last lexer rule catches any character
ANY
: .
;
The rule file will now match zero or more units. A unit will first try to match one of your my_interesting_statement, and when this is not possible, the last alternative in the unit rule, the ., will match just a single token (that is right: a . inside a parser rule matches a single token, not a single character).

Parsing and converting 4Test to Perl

I want to convert 4Test scripts to Perl. I have been using the Parse::RecDescent in Perl, but am still overwhelmed at the task. Here is an example.
An example of 4Test is something like:
ParseSMSPlans (STRING sDir)
{
STRING sFile;
STRING sDirSMSPlan = sDir + "sms_plans\";
STRING sDirPlan = sDir + "plan\";
STRING sDirDeal = sDir + "deal\";
STRING sDirProduct = sDir + "product\";
STRING sLine, sType, sName;
HFILE hIn;
FILEINFO fiFile;
LIST OF FILEINFO lfInfo = SYS_GetDirContents (sDirSMSPlan);
...
}
...
This is my Parse::RecDescent grammar
my $grammar = q{
#-----------------identifiers and datatypes-------------------#
identifier : /[a-z]\w+/
binops : '+' | '-' | '/' | '*' | '%' | '**'
lbinops: '!' | '<' | '>' | '>='| '<='| '&&'| '||' | '=='
integer: /\d+/ {print "hello $item[-1]" if $::debugging;}
number : /(\d+|\d*\.\d+)/ {print "hello $item[-1]" if $::debugging;}
string : /"(([^"]*)(\\")?)*"/
operation : number binops number operation(s?)
datatype : /[a-zA-Z]\w*/
definition : datatype expression(s) #{print "hello $item[-1]" if $::debugging;}
|datatype expression(s) "=" expression(s) #{print "hello $item[-1] = $item[-2]" if $::debugging;}
statement : ifexp | elsexp | elseifexp |forexp | feachexp | whexp | swcexp
#------------------Expressed Values-----------------------------#
program : expression
expression : number {print $item[1] if $::debugging}
| integer
| assignment
| operation
| identifier binops expression
| number binops expression
#------------------Conditionals---------------------------------------#
ifexp : 'if' '(' expression(s) ')' '{' expression(s) '}' elsexp(?)
elsexp : 'else' '{' expression(s) '}'
elseifexp: 'else' 'if' '(' expression(s) ')' '{' expression(s) '}'
forexp : 'for' '(' expression ';' expression ';' expression ')' '{' expression(s) }'
| 'for' assignment 'to' number expression(s) | 'for' assignment 'to' number '{' expression(s) '}'
feachexp : 'for each' expression 'in' expression '{' expression(s) '}'
whexp : 'while' '(' expression ')' '{' expression(s) '}'
casest : 'case' expression(s /,/) ':'
swcexp : 'switch' identifier '{' casest(s) '}' expression(s) 'default'
assignment : identifier(s) '=' expression
};
So, I'm looking at adding "$" to every variable name, and chopping
datatypes. For the most part my grammar works, though I have fully
tested it yet, but only because Parse::RecDescent has been a bit
tricky for me to understand and, I'm not really sure if it's the best
way to complete my task...or the fastest, for that matter.
My main concern is whether or not anyone feels that PRD can handle what
I'm asking it to do or will simple(complex) regex(s) suffice? I would
appreciate any help anyone could offer on this.
May I suggest that you simplify your 4Test scripts by replacing all operator expressions with subroutine calls. You would then still be able to run them as 4Test scripts and thus prove that they worked, but you would greatly simplify the parsing problem - operator expressions are much more difficult to parse than straight procedure calls. Taken to the limit, this process will produce 4Test scripts which can almost be run directly by Perl+some replacement routines for 4Test functions.