Why does a similar rule in the ANTLR grammar file produce a completely different tree? - tsql

I am using the grammar file at https://github.com/antlr/grammars-v4/blob/master/sql/tsql/TSqlParser.g4. It has a built_in_functions grammar rule. I want to parse a new function, DAYZ, as a built-in function. I introduced it thus in the .g4
built_in_functions
// https://msdn.microsoft.com/en-us/library/ms173784.aspx
: BINARY_CHECKSUM '(' '*' ')' #BINARY_CHECKSUM
// https://msdn.microsoft.com/en-us/library/ms186819.aspx
| DATEADD '(' datepart=ID ',' number=expression ',' date=expression ')' #DATEADD
| DAYZ '(' date=expression ')' #DAYZ
When I use grun to test the grammar, I get unexpected results for DAYZ. For a DATEDIFF I get what I expect.
For DAYZ, I get the following tree
Why does the parser not treat DAYZ as satisfying the rule built_in_functions like it does for DATEDIFF ? If the parser recognizes DAYZ eventually as an _Id, it should do the same for DATEDIFF. There must be something wrong in the way I am introducing DAYZ into the grammar but I can't figure it out. Any help appreciated. And apologies if I am not using the correct ANTLR terminology. I am a newbie to ANTLR.
I am using antlr-4.9.2-complete.jar

Move your lexer rule for DAYZ to appear before the ID rule in the TSqlLexer.g4 file.
since the id_ rule recognizing the token, then it must be being tokenized as an ID token. This will happen if you DAYZ rule definition is after the ID rule definition.
When ANTLR finds two lexer rules that match the same string of input characters (i.e. "DAYZ"), then it will use whichever rule appears first in the grammar.

Related

How to fix mismatch input x expecting y

I am new to antler and creating a parse tree. I am trying to create tokens that include a special character, but when I do so it gives me an input mismatch.
I have tried to add a special character to my LEXER rules by adding a '.' at the end, however when I do so it give me the error of input mismatch. The snippet of code that I am trying will work on its own but not as part of the entire code.
This is the code I have so far...
grammar Grammar4;
r : WORD', 'NUMBER', 'BOOL', 'SENT+;
BOOL : 'true' | 'false';
WORD : [a-zA-Z]+;
NUMBER : [0-9]+;
SENT : [a-zA-Z ]+;
WS : [ \t\r\n]+ -> skip ;
If I add a period at the end of SENT to allow for special characters ([a-zA-Z ]+.;) then I get an input mismatch. If I take that line out and use it independently of the rest than I can have a sentence like, "How are you today!" and have it tokenize fine.Any help is greatly appreciated.
Edited for clarity:
I am trying to parse a statement like, Alex, 31, false, I let the dog out! (note that I can get everything to parse as an individual token except the last special character and I would like "I let the dog out!" to be one token.

Xtext disambiguation

Given the following grammar:
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Program:
{Range} ID '.' '.' ID
| {Group} ID ID ID ID
;
terminal ID:
'a' | '.'
;
and the following input:
a . . a
I would argue that there are two ways in which the string can be parsed: as a Range (the first alternative) or as a Group (the second alternative). When I try this in my generated IDE and inspect the Ecore model, a Range is instantiated.
What makes Xtext decide in favor of the Range?
Edit: specifically, I'm wondering why the Xtext grammar itself is not ambiguous, since a range 'a'..'z' can be parsed as either a Group of Keyword, Wildcard, Wildcard, Keyword or as a CharacterRange of Keyword, Keyword.
Keywords become Lexer Rules a well. Thus you have two Lexer Rules
terminal FULL_STOP_KEYWORD: '.' ;
and
terminal ID: 'a' | '.';
The Lexer is not stateful. Only one rule can win. Thus '.' will always be lexed as Keyword and never as ID

Rule reference is not currently supported in a set in ANTLR4 Grammar

I am trying to port Chris Lambro's ANTLR3 Javascript Grammar to ANTLR4
I am getting the following error,
Rule reference 'LT' is not currently supported in a set
in the following code ~(LT)*
LineComment
: '//' ~(LT)* -> skip
;
LT : '\n' // Line feed.
| '\r' // Carriage return.
| '\u2028' // Line separator.
| '\u2029' // Paragraph separator.
;
I need help understanding why I am getting this error, and how I can solve it .
The ~ operator in ANTLR inverts a set of symbols (characters in the lexer, or tokens in the parser). Inside the set, you have a reference to the LT lexer rule, which is not currently supported in ANTLR 4. To resolve the problem, you need to inline the rule reference:
LineComment
: '//' ~([\n\r\u2028\u2029])* -> skip
;

SQL functions to match last two parts of a URL

This is a follow up question to a post here Regex for matching last two parts of a URL
I was wondering if I could use built in sql funcitons to accomplish the same type of pattern match without using regular expressions. In particular I was thinking if there was a way to reverse the string say www.stackoverflow.com to com.stackoverflow.www and then apply concatenation to split('com.stackoverflow.www , '.', 1) || split('com.stackoverflow.www , '.', 2) I would be done but I am not sure if this is possible.
Here is the general problem description:
I am trying to figure out the best sql function simply match only the last two strings seperated by a . in a url.
For instance with www.stackoverflow.com I just want to match stackoverflow.com
The issue i have is some strings can have a large number of periods for instance
a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com
should also return only yimg.com
The set of URLS I am working with does not have any of the path information so one can assume the last part of the string is always .org or .com or something of that nature.
What sql functions will return stackoverflow.com when run against www.stackoverflow.com and will return yimg.com when run against a-abcnewsplus.i-a277eea3.rtmp.atlas.cdn.yimg.com under the conditions stated above? I did not want to use regular expressions in the solution just sql string manipulation functions.
doesn't look like you can do it with 8.3 function set http://www.postgresql.org/docs/8.3/static/functions-string.html
there's no reverse - that comes in 9.1. with that you could do:
select reverse(split_part(reverse(data), '.', 2)) || '.'
|| reverse(split_part(reverse(data), '.', 1))
from example;
see http://sqlfiddle.com/#!1/25e43/2/0
you can declare your own reverse: http://a-kretschmer.de/diverses.shtml and then solve this problem.
but regex is just easier...

Avoid Custom Terminals Hiding (Suppressing) Derived Ones

I started playing around with xtext a few days ago and just went through the tutorials. Maybe the solution has been covered in the reference somewhere but I cannot get it right quickly.
My problem is this. I tried to write a simple grammar which mixed in org.eclipse.xtext.common.Terminals . Then I wanted to insert a cusotm terminal FILE_NAME like this:
terminal FILE_NAME:
( !('/' | '\\' | ':' | '*' | '?' | '"' | '<' | '>' | '|') )+
;
That's basically what a filename is allowed to be under Windows. However, by doing that, inherited rules like ID, INT, etc. would never be matched, because they are always generated after custom terminals.
Can that kind of problem be avoided gracefully (as repeatless as possible and as general as possible)? Thanks in advance!
Terminal rules (aka lexer rules) are used to tokenize the input sequenze. IMHO there should be a minimum of semantics in terminal rules.
You try to express a specialized parser rule which accepts only valid file names.
Have a look at parser phases described in the Xtext Documentation [1]. My suggestion:
Lexing: Instead of using a specialized terminal rule go with STRING.
Validation: Write a validation rule for an EClass with a 'fileName' EAttribute.
as repeatless as possible and as general as possible
You don't want to repeat your validation for every EClass with a 'fileName' EAttribute. Introduce a new super type with a 'fileName' EAttribute if you have a refined Ecore model.
Than you can implement one general validation rule #check_fileName_is_valid(ElementWithFile).
And if you don't have a refined MM use meta model hints within your grammar. If you provide a generalized super type Xtext's Ecore inferrer will pull up common features of the subtypes. Ex:
ElementWithFile: A | B;
A: ... 'file' fileName=STRING ...;
B: ... 'file' fileName=STRING ...;
// => Ecore: ElementWithFile.fileName<EString>
[1] http://www.eclipse.org/Xtext/documentation.html#DSL