Rule reference is not currently supported in a set in ANTLR4 Grammar - porting

I am trying to port Chris Lambro's ANTLR3 Javascript Grammar to ANTLR4
I am getting the following error,
Rule reference 'LT' is not currently supported in a set
in the following code ~(LT)*
LineComment
: '//' ~(LT)* -> skip
;
LT : '\n' // Line feed.
| '\r' // Carriage return.
| '\u2028' // Line separator.
| '\u2029' // Paragraph separator.
;
I need help understanding why I am getting this error, and how I can solve it .

The ~ operator in ANTLR inverts a set of symbols (characters in the lexer, or tokens in the parser). Inside the set, you have a reference to the LT lexer rule, which is not currently supported in ANTLR 4. To resolve the problem, you need to inline the rule reference:
LineComment
: '//' ~([\n\r\u2028\u2029])* -> skip
;

Related

Why does a similar rule in the ANTLR grammar file produce a completely different tree?

I am using the grammar file at https://github.com/antlr/grammars-v4/blob/master/sql/tsql/TSqlParser.g4. It has a built_in_functions grammar rule. I want to parse a new function, DAYZ, as a built-in function. I introduced it thus in the .g4
built_in_functions
// https://msdn.microsoft.com/en-us/library/ms173784.aspx
: BINARY_CHECKSUM '(' '*' ')' #BINARY_CHECKSUM
// https://msdn.microsoft.com/en-us/library/ms186819.aspx
| DATEADD '(' datepart=ID ',' number=expression ',' date=expression ')' #DATEADD
| DAYZ '(' date=expression ')' #DAYZ
When I use grun to test the grammar, I get unexpected results for DAYZ. For a DATEDIFF I get what I expect.
For DAYZ, I get the following tree
Why does the parser not treat DAYZ as satisfying the rule built_in_functions like it does for DATEDIFF ? If the parser recognizes DAYZ eventually as an _Id, it should do the same for DATEDIFF. There must be something wrong in the way I am introducing DAYZ into the grammar but I can't figure it out. Any help appreciated. And apologies if I am not using the correct ANTLR terminology. I am a newbie to ANTLR.
I am using antlr-4.9.2-complete.jar
Move your lexer rule for DAYZ to appear before the ID rule in the TSqlLexer.g4 file.
since the id_ rule recognizing the token, then it must be being tokenized as an ID token. This will happen if you DAYZ rule definition is after the ID rule definition.
When ANTLR finds two lexer rules that match the same string of input characters (i.e. "DAYZ"), then it will use whichever rule appears first in the grammar.

When I run a file which is begin with "#!/usr/bin/perl -w", I get a error: "syntax error at line 153, near "=~ ?""

When I run a file which is begin with #!/usr/bin/perl -w, I get a error:
syntax error at line 153, near "=~ ?"
I try to add "#!/bin/bash", this error is not append, but I get another
error:
"line 34: syntax error near unexpected token `('"
line 153 in my file:
($output_volume =~ ?^([\S]+).mnc?) && ($base_name = $1) ||
die "sharpen_volume failed: output volume does not appear to be"
." a minc volume.\n";
line34 in my file:
use MNI::Startup qw(nocputimes);
$output_volume =~ ?^([\S]+).mnc?
This used to be valid perl and thus might appear in old code and instructional material.
From perlop:
In the past, the leading m in m?PATTERN? was optional, but omitting it would produce a deprecation warning. As of v5.22.0, omitting it produces a syntax error. If you encounter this construct in older code, you can just add m.
That is Perl code so the first error message is meaningful.
With delimiters other than // in the match operator you must have the explicit m for it, so
$output_volume =~ m?^([\S]+).mnc?
It is only with // delimiters that the m may be omitted; from Regex Quote-Like Operators (perlop)
If "/" is the delimiter then the initial m is optional.
See perlretut for a tutorial introduction to regex and perlre for reference.
Also note that the particular delimiters of ? trigger specific regex behavior in a special case. This is discussed by the end of the documentation section in perlop linked above.
You already have two answers that explain the problem.
? ... ? is no longer valid syntax for a match operator. You need m? ... ? instead.
Until Perl 5.22, your syntax generated a warning. Now it's a fatal error (which is what you are seeing). So I assume you're now running this on a more recent version of Perl.
There are, however, a few other points it is probably worth making.
You say you tried to investigate this by changing the first line of your file from #!/usr/bin/perl -w to #!/bin/bash. I'm not sure how you think this was going to help. This line defines the program that is used to run your code. As you have Perl code, you need to run it with Perl. Trying to run it with bash is very unlikely to be useful.
The m? ... ? (or, formerly, ? ... ?) syntax triggers an obscure and specialised behaviour. It seems to me that this behaviour isn't required in your case, so you can probably change it to the more usual / ... /.
Your regex contains an unescaped dot character. Given that you seem to be extracting the basename from a filename that has an extension, it seems likely that this should be escaped (using \.) so that it matches an actual dot (rather than any character).
If you are using this code to extract a file's basename, then using a regex probably isn't the best approach. Perhaps take a look at File::Basename instead.

How to fix mismatch input x expecting y

I am new to antler and creating a parse tree. I am trying to create tokens that include a special character, but when I do so it gives me an input mismatch.
I have tried to add a special character to my LEXER rules by adding a '.' at the end, however when I do so it give me the error of input mismatch. The snippet of code that I am trying will work on its own but not as part of the entire code.
This is the code I have so far...
grammar Grammar4;
r : WORD', 'NUMBER', 'BOOL', 'SENT+;
BOOL : 'true' | 'false';
WORD : [a-zA-Z]+;
NUMBER : [0-9]+;
SENT : [a-zA-Z ]+;
WS : [ \t\r\n]+ -> skip ;
If I add a period at the end of SENT to allow for special characters ([a-zA-Z ]+.;) then I get an input mismatch. If I take that line out and use it independently of the rest than I can have a sentence like, "How are you today!" and have it tokenize fine.Any help is greatly appreciated.
Edited for clarity:
I am trying to parse a statement like, Alex, 31, false, I let the dog out! (note that I can get everything to parse as an individual token except the last special character and I would like "I let the dog out!" to be one token.

Drools viable input error

I used SpreadsheetCompiler to extract the drl for my Decision Table. Here is the relevant bit
global Integer netincome;
// rule values at C14, header at C8
rule "Net Income_14"
salience 65522
when
user:CSUserBundle(user.grossHouseholdIncome >= 0, user.grossHouseholdIncome < 1150000, user.grossHouseholdIncome >= 15700*52, user.grossHouseholdIncome < 86600*52)
then
netincome = eval(user.grossHouseholdIncome - 0 - (user.grossHouseholdIncome – 816400) * 0.12 - 0)
end
My error is:
E 14:35:30:235 : main : org.drools.compiler.kie.builder.impl.AbstractKieModule : Unable to build KieBaseModel:defaultKieBase
[11,78]: [ERR 101] Line 11:78 no viable alternative at input ''
Unfortunately the column number 78, is in the error is the middle of the 2nd user.grossHouseholdIncome in the 'then' statement. I searched thru the documentation but could not find anything about using a variable name twice in the text. I tried adding the 'eval' in response to De Smet's suggestion for the same error. Any ideas?
What I did was to copy-paste the rule into a decent text editor and then try to search for all occurrences of special ASCII characters like quote (") or hyphen (-) or anything else the marvellous office programs are apt to convert into some Unicode glyph that sure is looking good but rejected by compilers. Also, do not trust spaces. Frequently they are optical illusions created by a program due to some TAB character. I have replaced the spaces by a single underscore
to represent a TAB. And now the 78 aligns exactly with the evil character.
_netincome = eval(user.grossHouseholdIncome - 0 - (user.grossHouseholdIncome – 816400) * 0.12 - 0)
....5...10....5...20....5...30....5...40....5...50....5...60....5...70....5...80

.... undeclared (first use in this function)?

I have a simple code in lex language and i generate lex.yy.c with Flex.
when i want to compile lex.yy.c to .exe file i get some error like "undeclared (first use in this function) " ! when i search in web i understand i need a Const.h file, so i want to generate that file.
how i can do this ?
Some Errors:
35 C:\Users\Majid\Desktop\win\lex.l STRING' undeclared (first use in this function)
38 C:\Users\Majid\Desktop\win\lex.lLC' undeclared (first use in this function)
39 C:\Users\Majid\Desktop\win\lex.l `LP' undeclared (first use in this function)
....
Beginnig of The Code is :
%{int stk[20],stk1[20];
int lbl=0,wlbl=0;
int lineno=0;
int pcount=1;
int lcount=0,wlcount=0;
int token=100;
int dtype=0;
int count=0;
int fexe=0;
char c,str[20],str1[10],idename[10];
char a[100];
void eatcom();
void eatWS();
int set(int);
void check(char);
void checkop();
int chfunction(char *);%}
Digit [0-9]
Letter [a-zA-Z]
ID {letter}({letter}|{digit})*
NUM {digit}+
Delim [ \t]
A [A-Za-z]]
%%
"/*" eatcom();
"//"(.)*
\"(\\.|[^\"])*\" return (STRING);
\"(\\.|[^\"])*\n printf("Adding missing \" to sting constant");
"{" {a[count++]='{';fexe=0;eatWS();return LC;}
"(" {a[count++]='(';eatWS();return LP;}
"[" {a[count++]='[';eatWS();return LB;}
"}" {check('{');eatWS();
if(cflag)
{
//stk[lbl++]=lcount++;
fprintf(fc,"lbl%d:\n",stk[--lbl]);
//stk[lbl++]=lcount++;
printf("%d\n",stk[lbl]);
cflag=0;
}
return RC;
}
"]" {check('[');eatWS();return RB;}
")" {check('(');eatWS();return RP;}
"++" | "--" return INCOP;
[~!] return UNOP;
"*" {eatWS();return STAR;}
[/%] {eatWS();return DIVOP;}
"+" {eatWS();return PLUS;}
"-" {eatWS();return MINUS;}
You need a .h file with the definitions. You can write it by hand, but typically this file is generated by Bison. The two tools Flex and Bison are very often used together.
Bison is a parser-generator. Its input is a file where you have written a grammar that describes the syntax of a language, and Bison generates a parser. The parser (or "syntactical analyzer") is the part of a compiler (or similar tool) that analyzes input according to the syntax of the language. For example, it is the parser that knows that an if statement can, but doesn't have to, have an else part.
Flex is a scanner-generator, and from a file with regular expressions it creates a scanner. The scanner (or "lexical analyzer") is the part of a compiler (or similar tool) that analyzes input and divides it up into "tokens". A token can be a keyword such as if, an operator such as +, an integer constant, etcetera. It is the scanner that for example knows that an integer constant is written as a sequence of one or more digits.
The scanner reports to the parser when it has found a token. For example, if the input starts with 123, the scanner might recognize that this is an integer constant, and report this to the parser. In the case of Flex and Bison, it does this by returning the token code for integer constant, which might (just as an example) be 17. But since the scanner and parser must agree on these token codes, they need common definitions. Bison will generate token codes, and if given the flag -d it will output them in a .h file.
Thomas Niemann's A Compact Guide to Lex & Yacc gives a good introduction to how to use Flex and Bison. (Lex and Yacc are the old, original tools, and Flex and Bison are new, free versions of the same tools.)
For yacc and lex part this error disappeared for me when I used yacc -d xyz.y command where d is flag and xyz is file of my yacc file.