How to implement EOF in Antlr4 Grammar - tsql

I have a grammar that I am using for TSQL-esque language validation. Currently the grammar rules will work with a statement such as SUM(column1) + SUM(column2).
I would like the parser to throw an error if it was given something like
SUM(column1) SUM(column2). Notice the lack of an operator between the two SUMs. Right now, if I run this statement through the parser it does not error out. Instead it will return the first part of the statement, SUM(column1) and completely disregard the rest of the statement.
Upon research, I believe the answer to my problem is adding a EOF to my grammar. I have tried to implement this in several ways but it has not made any difference to the parsing.
This is the best way I can think to implement it in my grammar file, in the argument_list:
grammar DataAnalysis;
expression : literal #literalAtomExp
| FUNCTION=ID '(' argument_list ')' #functionExp
| INLINE_FUNCTION '(' argument_list ')' #inlineFunctionExp
| '(' expression ')' #parenthesisExp
| expression (ASTERISK|SLASH) expression #mulDivExp
| expression (PLUS|MINUS) expression #addSubExp
| <assoc=right> expression '^' expression #powerExp
| QUOTEDTEXT #stringExp
;
argument_list : expression (',' expression)* EOF //implemented here
;
literal : (TABLE_NAME=ID '.')? COLUMN_NAME=ID
| VALUE=NUMBER
;
fragment NAME : [a-zA-Z0-9_] ;
fragment LETTER : [a-zA-Z] ;
fragment DIGIT : [0-9] ;
ASTERISK : '*' ;
SLASH : '/' ;
PLUS : '+' ;
MINUS : '-' ;
INLINE_FUNCTION : 'YEAR'
| 'MONTH'
| 'DAY'
;
NUMBER : ('-')? DIGIT+ ('.' DIGIT+)? ;
ID : LETTER (NAME+) ;
QUOTEDTEXT : '"' .*? '"' ;
WHITESPACE : ' ' -> channel(HIDDEN);
Even like this the parsing doesn't pick up on the issue and returns only the first part of the query.
To summarize, when I feed the parser SUM(column1) SUM(column2) I would like it to return an error because it doesn't have any associated rule for that case.
Don't know what I am missing. Thanks for any direction.

Augment the grammar with "expr_prime : expression EOF;", and remove the EOF from the argument_list rule. Start parsing with expr_prime().

Related

Parse TSQL/Sybase *= conditional operator used to express outer join using ANTLR4

Sybase has that non-ANSI SQL conditional operator used to express outer join: *=.
It's being deprecated (http://dcx.sybase.com/1200/en/dbusage/apxa-transactsqlouter-joins-aspen.html).
As we are moving from Sybase ASE to MySQL I have started to use ANTLR4 to parse the Sybase SQL code to try to translate it into the MySQL code equivalent.
I have tried adding it to the TSqlParser.g4 grammar available here: https://github.com/antlr/grammars-v4/tree/master/sql/tsql. See '*' '=' at the end of the line below but it doesn't work
// https://msdn.microsoft.com/en-us/library/ms188074.aspx
// Spaces are allowed for comparison operators.
comparison_operator
: '=' | '>' | '<' | '<' '=' | '>' '=' | '<' '>' | '!' '=' | '!' '>' | '!' '<' | '*' '='
;
I tried a few things to make it work like escaping \* and removing the *= assignment_operator but nothing works. It's probably a dumb question since I'm new to ANTLR. :-(
Please help.
The input *= is being tokenised as a MULT_ASSIGN by the lexer. You defined it as two separate tokens: '*' '=', which is not the same as '*='.
If you parse following the input with your grammar:
SELECT Q
FROM T
WHERE ID * = 42;
it will go fine, but to parse this properly:
SELECT Q
FROM T
WHERE ID *= 42;
you need to do it like this:
comparison_operator
: ... | '*='
;
and to support both, do this:
comparison_operator
: ... | '*' '=' | '*='
;

Visual studio code user snippets capitalize not working properly

I wrote such code
"State": {
"prefix": "state",
"body": [
"const [$1, set${1:/capitalize}] = useState($2);"
],
"description": "Adds state"
},
I expect that the result will be (if I enter test in $1) like this:
const [test, setTest] = useState($2);
But I get such result:
const [/capitalize, set/capitalize] = useState();
In official docs I found such rule: '${' int ':' '/upcase' | '/downcase' | '/capitalize' '}'.
Could you please tell what I am doing wrong?
You can use below snippet for the requested output:
const [$1, set${1/(.*)/${1:/capitalize}/}] = useState($2);
Output will be (in case I enter $1 as test):
const [test, setTest] = useState();
Lets look at why your version ${1:/capitalize} doesn't work:
Here is a portion of the snippet grammar you cited from https://code.visualstudio.com/docs/editor/userdefinedsnippets
tabstop ::= '$' int
| '${' int '}'
| '${' int transform '}'
-snip-
transform ::= '/' regex '/' (format | text)+ '/' options
format ::= '$' int | '${' int '}'
| '${' int ':' '/upcase' | '/downcase' | '/capitalize' '}'
So initially it looks like ${1:/capitalize} is correct, just looking at the last line of the grammar above it seems
${' int ':' '/capitalize'
is a valid option. But you have to track through the grammar to use it properly. The format syntax can only be used in a transform. We see this in:
transform ::= '/' regex '/' (format | text)+ '/' options
So right there your version does not include a transform. You do not have the necessary regex preceder. So those '/upcase' | '/downcase' | '/capitalize' options can only be used as part of a transform with a regex (although you can have an empty regex but that doesn't help you and you still need to have the regex entry point anyhow).
Here is the general form of a transform:
${someInt/regex captures here/do something with the captures here, like ${1:/capitalize} /}
Note that the first $someInt is a tabstop - it could be $1 for example, but the second $1 (with the capitalize) is NOT a tabstop but a reference to the first capture group from the preceding regex. So a transform can only transform something that has been captured by a regex.
The grammar requires that the format option be part of a transform, and a transform requires a regex and $n's in the format part refer to capture groups and not tabstop variables.
I hope this all makes sense.

VS code snippet substitution (transform) works with variables not placeholders

vs code supposedly is supports substation, i.e., transforms, in user-defined snippets. But its working for me only with (built-in) variables and not placeholders.
See the following snippet:
"substitution test" : {
"prefix" : "abc",
"body": [
"${TM_FILENAME}",
"${TM_FILENAME/^([^.]+)\\..+$/$1/}",
"${TM_FILENAME/^([^.]+)\\..+$/${1:/capitalize}/}",
"${TM_FILENAME/^([^.]+)\\..+$/${1:/upcase}/}",
"${2:showMeInAllCapsWhenReferenced}",
"${2/upcase}"
]
}
The output of lines 1-4 is as expected:
users.actions.ts
users
Users
USERS
In line 5 there is a placeholder and I reference it again in line 6. I want it to show both times, once as I type it, and again in all-caps. So e.g.:
fooFoo
FOOFOO
But the actual output is
showMeInAllCapsWhenReferenced
${2/upcase}
Is substitution/transformation of referenced placeholders (as I type) even possible?
Your last two lines should be:
"${2:showMeInAllCapsWhenReferenced}",
"${2/(.*)/${1:/upcase}/}"
After the final tab the transform is actually applied (so not technically "as you type" the placeholder replacement).
From placeholder transforms:
The inserted text is matched with the regular expression and the match
or matches - depending on the options - are replaced with the
specified replacement format text.
So you cannot just use :/upcase for example without the regex capture as you tried to do on line 5 - it can only transform a regex match.
Looking at the grammar section :
transform ::= '/' regex '/' (format | text)+ '/' options
format ::= '$' int | '${' int '}'
| '${' int ':' '/upcase' | '/downcase' | '/capitalize' '}'
| '${' int ':+' if '}'
| '${' int ':?' if ':' else '}'
| '${' int ':-' else '}' | '${' int ':' else '}'
we see that the :/upcase must follow a regex. (The "format", of which upcase is one, must follow a "regex" in a "transform".)

Antler matches a similar rule (but fails on the parts that differ)

I'm creating an Xtext plugin and for some reason, the following line incorrectly matches the StringStatement rule when it should match the UnstringStatement rule:
UNSTRING test2 DELIMITED BY " " INTO test2 END-UNSTRING
Here is my grammar:
Program:
(elements+=Elemental)*
(s+=Statement)*
;
Variable_Name:
varName=ID ("-" ID)*
;
Variable_Reference:
varRef=ID ("-" ID)*
;
Elemental:
'VAR' var=Variable_Name
;
Statement:
(us=UnstringStatement|s=StringStatement)
;
StringParam:
Variable_Reference | STRING
;
StringStatement:
'STRING' in=StringParam 'DELIMITED BY SIZE INTO' out=Variable_Reference 'END-STRING'
;
UnstringStatement:
'UNSTRING' in=StringParam 'DELIMITED BY' string2=STRING 'INTO' (outs+=Variable_Reference)* 'END-UNSTRING'
;
When I run the project as an Eclipse Application, the 'UNSTRING' token is highlighted (correctly), but the rest of the line has the error "Mismatched character '"' expecting 'S'." The 'S' that the error refers too, is from 'SIZE'.
Any idea why the two rules overlap like this?
EDIT, forgot the STRING rule:
terminal STRING :
'"' ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|'"') )* '"' |
"'" ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|"'") )* "'"
;
EDIT 2:
After stepping through some of the Lexer code, I discovered that the token "DELIMITED BY" is incorrectly matched to "DELIMITED BY SIZE INTO", which then fails.
EDIT 3 FIXED:
I fixed this, but have no idea why it works. I just added a terminal DELIMITED_BY:
terminal DELIMITED_BY: 'DELIMITED BY'
StringStatement:
'STRING' in=StringParam DELIMITED_BY 'SIZE INTO' out=Variable_Reference 'END-STRING'
;
UnstringStatement:
'UNSTRING' in=StringParam DELIMITED_BY string2=STRING 'INTO' (outs+=Variable_Reference)* 'END-UNSTRING'
;
The STRING Token looks too greedy. In ANTLR the expression should be
terminal STRING :
'"' ( '\\' . | !('\\'|'"') )*? '"' |
"'" ( '\\' . | !('\\'|"'") )*? "'"
;

Parsing and converting 4Test to Perl

I want to convert 4Test scripts to Perl. I have been using the Parse::RecDescent in Perl, but am still overwhelmed at the task. Here is an example.
An example of 4Test is something like:
ParseSMSPlans (STRING sDir)
{
STRING sFile;
STRING sDirSMSPlan = sDir + "sms_plans\";
STRING sDirPlan = sDir + "plan\";
STRING sDirDeal = sDir + "deal\";
STRING sDirProduct = sDir + "product\";
STRING sLine, sType, sName;
HFILE hIn;
FILEINFO fiFile;
LIST OF FILEINFO lfInfo = SYS_GetDirContents (sDirSMSPlan);
...
}
...
This is my Parse::RecDescent grammar
my $grammar = q{
#-----------------identifiers and datatypes-------------------#
identifier : /[a-z]\w+/
binops : '+' | '-' | '/' | '*' | '%' | '**'
lbinops: '!' | '<' | '>' | '>='| '<='| '&&'| '||' | '=='
integer: /\d+/ {print "hello $item[-1]" if $::debugging;}
number : /(\d+|\d*\.\d+)/ {print "hello $item[-1]" if $::debugging;}
string : /"(([^"]*)(\\")?)*"/
operation : number binops number operation(s?)
datatype : /[a-zA-Z]\w*/
definition : datatype expression(s) #{print "hello $item[-1]" if $::debugging;}
|datatype expression(s) "=" expression(s) #{print "hello $item[-1] = $item[-2]" if $::debugging;}
statement : ifexp | elsexp | elseifexp |forexp | feachexp | whexp | swcexp
#------------------Expressed Values-----------------------------#
program : expression
expression : number {print $item[1] if $::debugging}
| integer
| assignment
| operation
| identifier binops expression
| number binops expression
#------------------Conditionals---------------------------------------#
ifexp : 'if' '(' expression(s) ')' '{' expression(s) '}' elsexp(?)
elsexp : 'else' '{' expression(s) '}'
elseifexp: 'else' 'if' '(' expression(s) ')' '{' expression(s) '}'
forexp : 'for' '(' expression ';' expression ';' expression ')' '{' expression(s) }'
| 'for' assignment 'to' number expression(s) | 'for' assignment 'to' number '{' expression(s) '}'
feachexp : 'for each' expression 'in' expression '{' expression(s) '}'
whexp : 'while' '(' expression ')' '{' expression(s) '}'
casest : 'case' expression(s /,/) ':'
swcexp : 'switch' identifier '{' casest(s) '}' expression(s) 'default'
assignment : identifier(s) '=' expression
};
So, I'm looking at adding "$" to every variable name, and chopping
datatypes. For the most part my grammar works, though I have fully
tested it yet, but only because Parse::RecDescent has been a bit
tricky for me to understand and, I'm not really sure if it's the best
way to complete my task...or the fastest, for that matter.
My main concern is whether or not anyone feels that PRD can handle what
I'm asking it to do or will simple(complex) regex(s) suffice? I would
appreciate any help anyone could offer on this.
May I suggest that you simplify your 4Test scripts by replacing all operator expressions with subroutine calls. You would then still be able to run them as 4Test scripts and thus prove that they worked, but you would greatly simplify the parsing problem - operator expressions are much more difficult to parse than straight procedure calls. Taken to the limit, this process will produce 4Test scripts which can almost be run directly by Perl+some replacement routines for 4Test functions.