How do I display all pronouns in a sentence and their persons using antlr - eclipse

EDITED according to WayneH's grammar
Here's what i have in my grammar file.
grammar pfinder;
options {
language = Java;
}
sentence
: ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
;
words
: WORDS {System.out.println($text);};
pronoun returns [String value]
: sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
| ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
| sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
| pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
| psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
| pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};
sfirst returns [String value] : ('i' | 'me' | 'my' | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] : ('he' | 'she' | 'it' | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] : ('we' | 'us' | 'our' | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] : ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};
WORDS : LETTER*;// {$channel=HIDDEN;};
SPACE : (' ')?;
fragment LETTER : ('a'..'z' | 'A'..'Z');
and here,s what i have on a java test class
import java.util.Scanner;
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import java.util.List;
public class test2 {
public static void main(String[] args) throws RecognitionException {
String s;
Scanner input = new Scanner(System.in);
System.out.println("Eter a Sentence: ");
s=input.nextLine().toLowerCase();
ANTLRStringStream in = new ANTLRStringStream(s);
pfinderLexer lexer = new pfinderLexer(in);
TokenStream tokenStream = new CommonTokenStream(lexer);
pfinderParser parser = new pfinderParser(tokenStream);
parser.pronoun();
}
}
what do I need to put in the test file so that the it will display all the pronouns in a sentence and their respective values(s1,s2,...)?

In case you are trying to do some sort of high-level analysis of spoken/written language, you might consider using some sort of natural language processing tool. For example, TagHelper Tools will tell you which elements are pronouns (and verbs, and nouns, and adverbs, and other esoteric grammatical constructs). (THT is the only tool of that sort that I'm familiar with, so don't take that as a particular endorsement of awesomeness).

fragments don't create tokens, and placing them in parser rules will not give desirable results.
On my test box, this produced (I think!) the desired result:
program :
PRONOUN+
;
PRONOUN :
'i' | 'me' | 'my' | 'mine'
| 'you' | 'your'| 'yours'| 'yourself'
| 'he' | 'she' | 'it' | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
| 'we' | 'us' | 'our' | 'ours'
| 'yourselves'
| 'they'| 'them'| 'their'| 'theirs' | 'themselves'
;
WS : ' ' { $channel = HIDDEN; };
WORD : ('A'..'Z'|'a'..'z')+ { $channel = HIDDEN; };
In Antlrworks, a sample "i kicked you" returned the tree structure: program -> [i, you].
I feel compelled to point out that Antlr is overkill for stripping the pronouns out of a sentence. Consider using a regular expression. This grammar is not case insensitive. Expanding WORD to consume everything except your dictionary of PRONOUNs (such as puncuation, etc) may be a bit tedious. Will require sanitization of input.
--- Edit: In response to the second OP:
I have altered the original grammar to make ease of parsing. The new grammar is:
grammar pfinder;
options {
backtrack=true;
output = AST;
}
tokens {
PROGRAM;
}
program :
(WORD* p+=PRONOUN+ WORD*)*
-> ^(PROGRAM $p*)
;
PRONOUN :
'i' | 'me' | 'my' | 'mine'
| 'you' | 'your'| 'yours'| 'yourself'
| 'he' | 'she' | 'it' | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself'
| 'we' | 'us' | 'our' | 'ours' | 'yourselves'
| 'they'| 'them'| 'their'| 'theirs' | 'themselves'
;
WS : ' ' { $channel = HIDDEN; };
WORD : ('A'..'Z'|'a'..'z')+;
I'll explain the changes:
Backtracking is now required to solve the parser rule program. Perhaps there's a better way to write it which doesn't require backtracking but this is the first thing that popped in to my mind.
An imaginary token PROGRAM has been defined to group our pronouns.
Each matched program is added to Antlr var $p and is rewritten in AST under the imaginary rule.
The interpreter code may now use a CommonTree to collect matched pronouns
The following is written in C# (I don't know Java) but I wrote it with the intent that you'll be able to read and understand it.
static object[] ReadTokens( string text )
{
ArrayList results = new ArrayList();
pfinderLexer Lexer = new pfinderLexer(new Antlr.Runtime.ANTLRStringStream(text));
pfinderParser Parser = new pfinderParser(new CommonTokenStream(Lexer));
// syntaxTree is imaginary token {PROGRAM},
// its children are the pronouns collected by $p in grammar.
CommonTree syntaxTree = Parser.program().Tree as CommonTree;
if ( syntaxTree == null ) return null;
foreach ( object pronoun in syntaxTree.Children )
{
results.Add(pronoun.ToString());
}
return results.ToArray();
}
Calling ReadTokens("i kicked you and them") returns array ["i", "you", "them"]

I think you need to learn more about lexer rules within ANTLR, lexer rules start with uppercase letter and generate tokens for the stream the parser will look at. Lexer fragment rules will not generate a token for the stream but will help other lexer rules generate tokens, look at lexer rules WORDS and LETTER (LETTER is not a token but does help WORDS create a token).
Now, when a text literal is put into a parser rule (rule name will start with a lowercase letter) that text literal is also a valid token that the lexer will identify and pass (at least when you use ANTLR - I have not used any other tools similar to ANTLR to answer for them).
The next thing I was noticing is that your 's' and 'pronoun' rules appear to be the same thing. I commented out the 's' rule and put everything into the 'pronoun' rule
And then the last thing is to learn how to put actions into the grammer, you have some in the 's' rule setting the return value. I made the pronoun rule return a string value so that if you wanted the actions in your 'sentence' rule you would easily be able to accomplish your "-i pronoun" comment/answer.
Now since I do not know what your exact results are, I played with your grammer and made some slight modifications and reorganized (moving what I thought were parser rules to the top with keep all lexer rules at the bottom) and put in some actions that I think will show you what you need. Also, there could be several different ways to accomplish this and I don't think my solution is perfect for any of your possible wanted results, but here is a grammer I was able to get working in ANTLRWorks:
grammar pfinder;
options {
language = Java;
}
sentence
: ((words | pronoun) SPACE)* ((words | pronoun) ('.' | '?'))
;
words
: WORDS {System.out.println($text);};
pronoun returns [String value]
: sfirst {$value = $sfirst.value; System.out.println($sfirst.text + '(' + $sfirst.value + ')');}
| ssecond {$value = $ssecond.value; System.out.println($ssecond.text + '(' + $ssecond.value + ')');}
| sthird {$value = $sthird.value; System.out.println($sthird.text + '(' + $sthird.value + ')');}
| pfirst {$value = $pfirst.value; System.out.println($pfirst.text + '(' + $pfirst.value + ')');}
| psecond {$value = $psecond.value; System.out.println($psecond.text + '(' + $psecond.value + ')');}
| pthird{$value = $pthird.value; System.out.println($pthird.text + '(' + $pthird.value + ')');};
//s returns [String value]
// : exp=sfirst {$value = "s1";}
// | exp=ssecond {$value = "s2";}
// | exp=sthird {$value = "s3";}
// | exp=pfirst {$value = "p1";}
// | exp=psecond {$value = "p2";}
// | exp=pthird {$value = "p3";}
// ;
sfirst returns [String value] : ('i' | 'me' | 'my' | 'mine') {$value = "s1";};
ssecond returns [String value] : ('you' | 'your'| 'yours'| 'yourself') {$value = "s2";};
sthird returns [String value] : ('he' | 'she' | 'it' | 'his' | 'hers' | 'its' | 'him' | 'her' | 'himself' | 'herself') {$value = "s3";};
pfirst returns [String value] : ('we' | 'us' | 'our' | 'ours') {$value = "p1";};
psecond returns [String value] : ('yourselves') {$value = "p2";};
pthird returns [String value] : ('they'| 'them'| 'their'| 'theirs' | 'themselves') {$value = "p3";};
WORDS : LETTER*;// {$channel=HIDDEN;};
SPACE : (' ')?;
fragment LETTER : ('a'..'z' | 'A'..'Z');
I think the end result is this grammer will show you how to accomplish what you are trying to do and will require modification no matter what that end result is.
Good luck.
I think you only have to change one line in your test class,
parser.pronoun();
to:
parser.sentence();
You might want to change a few other things in the grammer as well:
SPACE : ' ';
sentence: (words | pronoun) (SPACE (words | pronoun))* ('.' | '?'); // then you might want to put a rule between sentence and words/pronoun.

Related

Antlr4 can't recognize a single number and bracket. I don't know what the problem is?

lexer grammar TransformLexer;
#header { package com.abc.g4.gen; }
channels { DPCOMMENT, ERRORCHANNEL }
#members {
/**
* Verify whether current token is a valid decimal token (which contains dot).
* Returns true if the character that follows the token is not a digit or letter or underscore.
*
* For example:
* For char stream "2.3", "2." is not a valid decimal token, because it is followed by digit '3'.
* For char stream "2.3_", "2.3" is not a valid decimal token, because it is followed by '_'.
* For char stream "2.3W", "2.3" is not a valid decimal token, because it is followed by 'W'.
* For char stream "12.0D 34.E2+0.12 " 12.0D is a valid decimal token because it is folllowed
* by a space. 34.E2 is a valid decimal token because it is followed by symbol '+'
* which is not a digit or letter or underscore.
*/
public boolean isValidDecimal() {
int nextChar = _input.LA(1);
if (nextChar >= 'A' && nextChar <= 'Z' || nextChar >= '0' && nextChar <= '9' ||
nextChar == '_') {
return false;
} else {
return true;
}
}
}
// SKIP
SPACE: [ \t\r\n]+ -> channel(HIDDEN);
SPEC_MYSQL_COMMENT: '/*!' .+? '*/' -> channel(DPCOMMENT);
COMMENT_INPUT: '/*' .*? '*/' -> channel(HIDDEN);
LINE_COMMENT: (
('--' [ \t] | '#') ~[\r\n]* ('\r'? '\n' | EOF)
| '--' ('\r'? '\n' | EOF)
) -> channel(HIDDEN);
STRING
: DQUOTA_STRING
;
EQ : '==';
NEQ : '<>';
NEQJ: '!=';
LT : '<';
LTE : '<=';
GT : '>';
GTE : '>=';
PLUS: '+';
MINUS: '-';
ASTERISK: '*';
SLASH: '/' ;
PERCENT: '%';
RSHIFT: '>>';
LSHIFT: '<<';
IS: 'IS' | 'is';
NULL: 'NULL' | 'null';
TRUE: 'TRUE' | 'true';
FALSE: 'FALSE' | 'false';
LIKE: 'LIKE' | 'like';
OR: 'OR' | 'or' | '|';
AND: 'AND' | '&&' | 'and' | '&';
IN: 'IN' | 'in';
NOT: 'NOT' | '!' | 'not';
CASE: 'CASE' | 'case';
WHEN: 'WHEN' | 'when';
THEN: 'THEN' | 'then';
ELSE: 'ELSE' | 'else';
END: 'END' | 'end';
JOIN: '||';
ID: [#]ID_LITERAL+;
// DOUBLE_QUOTE_ID: '"' ~'"'+ '"';
REVERSE_QUOTE_ID: '`' ~'`'+ '`';
NAME: ID_LITERAL+;
fragment ID_LITERAL: [a-zA-Z_0-9\u0080-\uFFFF]*?[a-zA-Z_$\u0080-\uFFFF]+?[a-zA-Z_$0-9\u0080-\uFFFF]*;
fragment DQUOTA_STRING: '"' ( '\\'. | '""' | ~('"'| '\\') )* '"' | '\'' ( ~('\''|'\\') | ('\\' .) )* '\'';
fragment DEC_DIGIT: '0' .. '9'+;
// Last tokens must generate Errors
ERROR_RECONGNIGION: . -> channel(ERRORCHANNEL);
NEWLINE:'\r'? '\n' ;
BYTELENGTH_LITERAL
: DEC_DIGIT+ ('B' | 'K' | 'M' | 'G')
;
INTEGER_VALUE
: [-]*DEC_DIGIT+
;
DECIMAL_VALUE
: DEC_DIGIT+ EXPONENT
| DECIMAL_DIGITS EXPONENT? {isValidDecimal()}?
;
IDENTIFIER
: (LETTER | DEC_DIGIT | '_')+
;
BACKQUOTED_IDENTIFIER
: '`' ( ~'`' | '``' )* '`'
;
COMMA: ',' ;
LEFT_BRACKET
: '(('
;
RGIHT_BRACKET
: '))'
;
LEFT_BRACKET1
: '{{'
;
RGIHT_BRACKET1
: '}}'
;
START
: '$'
;
fragment DECIMAL_DIGITS
: DEC_DIGIT+ '.' DEC_DIGIT+
| '.' DEC_DIGIT+
;
fragment EXPONENT
: 'E' [+-]? DEC_DIGIT+
;
fragment LETTER
: [A-Z]
;
SIMPLE_COMMENT
: '--' ~[\r\n]* '\r'? '\n'? -> channel(HIDDEN)
;
BRACKETED_COMMENT
: '/*' .*? '*/' -> channel(HIDDEN)
;
WS
: [ \r\n\t]+ -> channel(HIDDEN)
;
parser grammar TransformParser;
options { tokenVocab=TransformLexer; }
#header { package com.abc.g4.gen; }
finalExpression:
(booleanExpression | caseExpression | resultExpression | function) EOF
;
caseExpression
: CASE whenClause+ (ELSE (elseExpression=resultExpression | caseExpression))? END #whenExpression
| constant #constantDefault
;
values:
constant #constantValue
| ID #idValue
;
valueCalc:
LEFT_BRACKET valueCalc RGIHT_BRACKET
| valueCalc ('*'|'/'|'%') valueCalc
| valueCalc ('+'|'-') valueCalc
| valueCalc ('<<'|'>>') valueCalc
| values
;
booleanExpression
: left=booleanExpression operator=AND right=booleanExpression #logicalBinary1
| left=booleanExpression operator=OR right=booleanExpression #logicalBinary
| NOT booleanExpression #logicalNot
| predicated #predicatedExpression
| left=valueCalc operator=comparisonOperator right=valueCalc #comparison4
| booleanValue #booleanValueTag
;
predicated
: (values | valueCalc) IN values (values)*
;
whenClause:
WHEN condition=booleanExpression THEN (result=resultExpression | caseExpression);
resultExpression:
predicated | values | valueCalc;
constant
: NULL #nullLiteral
| STRING #typeConstructor
| number #numericLiteral
| booleanValue #booleanLiteral
| STRING+ #stringLiteral
;
comparisonOperator
: EQ | NEQ | NEQJ | LT | LTE | GT | GTE | IS
;
booleanValue
: TRUE | FALSE
;
number
: MINUS? DECIMAL_VALUE #decimalLiteral
| MINUS? INTEGER_VALUE #integerLiteral
;
qualifiedName
: NAME
;
function
: qualifiedName (params) #functionCall
;
param:
valueCalc | values | function | booleanExpression
;
params:
param (param)*
;
I can recognize numbers of multiple characters, but I cannot recognize numbers of single characters
enter image description here
enter image description here
And parentheses cannot change the priority of expression calculation. What's wrong with my code
enter image description here
I try to replace '(', ')' with '((', '))' or '{{', '}}'. It can be done
enter image description here
Resolved: delete 'ERROR_ RECONGNATION 'Then it's OK

ANTLR4 - How to match something until two characters match?

Flutter:
Framework • revision 18116933e7 (vor 8 Wochen) • 2021-10-15 10:46:35 -0700
Engine • revision d3ea636dc5
Tools • Dart 2.14.4
Antrl4:
antlr4: ^4.9.3
I would like to implement a simple tool that formats text like in the following definition: https://www.motoslave.net/sugarcube/2/docs/#markup-style
So basically each __ is the start of an underlined text and the next __ is the end.
I got some issues with the following input:
^^subscript=^^
Shell: line 1:13 token recognition error at '^'
Shell: line 1:14 extraneous input '' expecting {'==', '//', '''', '__', '~~', '^^', TEXT}
MyLexer.g4:
STRIKETHROUGH : '==';
EMPHASIS : '//';
STRONG : '\'\'';
UNDERLINE : '__';
SUPERSCRIPT : '~~';
SUBSCRIPT : '^^';
TEXT
: ( ~[<[$=/'_^~] | '<' ~'<' | '=' ~'=' | '/' ~'/' | '\'' ~'\'' | '_' ~'_' | '~' ~'~' | '^' ~'^' )+
;
MyParser.g4:
options {
tokenVocab=SugarCubeLexer;
//language=Dart;
}
parse
: block EOF
;
block
: statement*
;
statement
: strikethroughStyle
| emphasisStyle
| strongStyle
| underlineStyle
| superscriptStyle
| subscriptStyle
| unstyledStatement
;
unstyledStatement
: plaintext
;
strikethroughStyle
: STRIKETHROUGH (emphasisStyle | strongStyle | underlineStyle | superscriptStyle | subscriptStyle | unstyledStatement)* STRIKETHROUGH
;
emphasisStyle
: EMPHASIS (strikethroughStyle | strongStyle | underlineStyle | superscriptStyle | subscriptStyle | unstyledStatement)* EMPHASIS
;
strongStyle
: STRONG (strikethroughStyle | emphasisStyle | underlineStyle | superscriptStyle | subscriptStyle | unstyledStatement)* STRONG
;
underlineStyle
: UNDERLINE (strikethroughStyle | emphasisStyle | strongStyle | superscriptStyle | subscriptStyle | unstyledStatement)* UNDERLINE
;
superscriptStyle
: SUPERSCRIPT (strikethroughStyle | emphasisStyle | strongStyle | underlineStyle | subscriptStyle | unstyledStatement)* SUPERSCRIPT
;
subscriptStyle
: SUBSCRIPT (strikethroughStyle | emphasisStyle | strongStyle | underlineStyle | superscriptStyle | unstyledStatement)* SUBSCRIPT
;
plaintext
: TEXT
;
I would be super happy for any help. Thanks
It's you TEXT rule:
TEXT
: (
~[<[$=/'_^~]
| '<' ~'<'
| '=' ~'='
| '/' ~'/'
| '\'' ~'\''
| '_' ~'_'
| '~' ~'~'
| '^' ~'^'
)+
;
You can't write a Lexer rule in ANTLR like you're trying to do (i.e. a '^' unless it's followed by another '^'). The ~'^' means "any character that's not ^")
if you run your input through grun with a -tokens option, you'll see that the TEXT token pulls everything through the EOL
[#0,0:1='^^',<'^^'>,1:0]
[#1,2:14='subscript=^^\n',<TEXT>,1:2]
[#2,15:14='<EOF>',<EOF>,2:0]
Try something like this:
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| plaintext # unstyledStatement
;
plaintext: TEXT+;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
TEXT: .;
This grammar correctly parses your input, but at the expense of turning everything other than your special characters into single character tokens.
With a bit more thought, we can minimize this:
grammar MyParser
;
parse: block EOF;
block: statement*;
statement
: STRIKETHROUGH statement STRIKETHROUGH # Strikethrough
| EMPHASIS statement EMPHASIS # Emphasis
| STRONG statement STRONG # Strong
| UNDERLINE statement UNDERLINE # Underline
| SUPERSCRIPT statement SUPERSCRIPT # SuperScript
| SUBSCRIPT statement SUBSCRIPT # Subscript
| (U_TEXT | TEXT)+ # unstyledStatement
;
STRIKETHROUGH: '==';
EMPHASIS: '//';
STRONG: '\'\'';
UNDERLINE: '__';
SUPERSCRIPT: '~~';
SUBSCRIPT: '^^';
U_TEXT: ~[=/'_~^]+;
TEXT: .;
This adds the U_TEXT lexer rule. This rule will pull together all unambiguous characters into a single token. This significantly reduces the number of tokens produced. (as well as the number of diagnostic warnings). It should perform much better than the first (I've not tried/timed it on large enough input to see the difference, but the resulting parse tree is much better.
Elaboration:
The ANTLR lexer rule evaluation works by examining your input. When multiple rules could match the next n characters of input, then it will continue looking at input characters until a character fails to match any of the "active" lexer rules. This establishes the longest run of characters that could match a lexer rule. If this is a single rule, it wins (by virtue of having matched the longest sequence of input characters). If there is more than one rule matching the same run of input characters then the lexer matches the first of those rules to appear in your grammar. (Technically, these situations are "ambiguities", as, looking at the whole grammar, there are multiple ways that ANTLR could have tokenized it. But, since ANTLR has deterministic rules for resolving these ambiguities, they're not really a problem.)
Lexer rules, just don't have the ability to use negation except for negating a set of characters (that appear between [ and ]). That means we can't write a rule to match a "< not followed by another <". We can match "<<" as a longer token than "<". To do that, we have to ensure that all tokens that could start one of your two character sequences, match a single token rule. However, we want to avoid making ALL other characters single character rules, so we can introduce a rules that is "everything but on our our special characters". This will greedily consume everything that isn't possibly "problematic". Leaving only the special characters to be caught by the single character `'.'`` rule at the end of the grammar.

Creating code snippets in Visual Studio Code using EBNF

Here they say we can generate code using EBNF but I don't understand how, it seems to only accept JSON. Does anyone know how to do it?
Thank you in advance.
The link that you mentioned, does not say that we can generate a new snippet using EBNF.
they have documented something like :
Below is the EBNF (extended Backus-Naur form) for snippets
And then.. they have given EBNF for snippets.
any ::= tabstop | placeholder | choice | variable | text
tabstop ::= '$' int | '${' int '}'
placeholder ::= '${' int ':' any '}'
choice ::= '${' int '|' text (',' text)* '|}'
variable ::= '$' var | '${' var }'
| '${' var ':' any '}'
| '${' var '/' regex '/' (format | text)+ '/' options '}'
format ::= '$' int | '${' int '}'
| '${' int ':' '/upcase' | '/downcase' | '/capitalize' '}'
| '${' int ':+' if '}'
| '${' int ':?' if ':' else '}'
| '${' int ':-' else '}' | '${' int ':' else '}'
regex ::= JavaScript Regular Expression value (ctor-string)
options ::= JavaScript Regular Expression option (ctor-options)
var ::= [_a-zA-Z] [_a-zA-Z0-9]*
int ::= [0-9]+
text ::= .*
It tells what is the possible combination and keywords which are accepted by the Snippet. It is indeed JSON format I can say by looking at the EBNF. The snippet creation is limited to this at the moment. we can not generate advanced snippet in the current release (Version 1.24).
Please read through the document to gether some more information on how to make a new snippet with variables given and the replacement logic. Thanks.

How can I customize the autocompletion created by Xtext?

Introduction: I have written a simple grammar in Xtext which is working apart from the autocompletion. It's not like it's not working at all but it doesn't work like I want it to so I started to search for a possibility to customize the autocompletion. I found out that there are a lot of possibilities but I couldn't find any concrete explanations to do so.
And that's what I'm asking for here. I hope you can help me.
My Grammar:
grammar sqf.Sqf with org.eclipse.xtext.common.Terminals
generate sqf "http://www.Sqf.sqf"
Model:
elements += Element*
;
Element:
Declaration ";" | Command ";"
;
Declaration:
Variable | Array
;
Variable:
name=ID "=" content=VARCONTENT (("+"|"-"|"*"|"/") content2+=VARCONTENT)*
;
Array:
name=ID "=" contet=ArrayLiteral | name=ID "=" "+" content2+=[Array]
;
ArrayLiteral:
"[" (content += ArrayContent)* "]" (("+"|"-")content1+=Extension)*
;
ArrayContent:
content01=Acontent ("," content02+=Acontent)*
;
Acontent:
STRING | INT | Variable | ref=[Declaration] | ArrayLiteral
;
Extension:
STRING | INT | ref=[Declaration]
;
Command:
Interaction
;
Interaction:
hint
;
hint:
Normal | Format | Special
;
Normal:
name=HintTypes content=STRING
;
Format:
name=HintTypes "format" "[" content=STRING "," variable=[Declaration] "]"
;
HintTypes:
"hint" | "hintC" | "hintCadet" | "hintSilent"
;
Special:
hintCArray
;
hintCArray:
title=STRING "hintC" (content1=ArrayLiteral | content=STRING)
;
VARCONTENT:
STRING | INT | ref=[Declaration] | "true" | "false" | "nil"
;
My Problem: The problem is that when I' hitting ctrl+space in the eclipse-editor(with my plug-in loaded) I only get the proposal "title"-STRING (from the hintCArray-rule) and name-ID (from the Declaration-rule).
What I would expect is that it also offers me the possibility to choose between my hint-commands. I also tried to hit ctrl+space after having begun typing the command (hin) and as a result the autocomletion put a "=" next to it so it would be a Variable- or Array-declaration.
What I have found out is that I have to edit some of the proposalProvider.java-Files (I think it's the AbstractProposalProvider) in the .ui-section.
I now have the problem that I don't know what I have to write in there.
In this AbstractProposalProvider.java there's stuff like this:
public void completeModel_Elements(EObject model, Assignment assignment, ContentAssistContext context, ICompletionProposalAcceptor acceptor) {
completeRuleCall(((RuleCall)assignment.getTerminal()), context, acceptor);
}
Greetings Raven
there are several problems with your grammar but when only concentrating on your actual question:
Stuff that is inside a DataTypeRule will never be shown in the proposals out of the box. (HintTypes)
To make it happen without coding something you could inine the possible types as keywords like this:
Normal:
name=("hint" | "hintC" | "hintCadet" | "hintSilent") content=STRING;
Format:
name=("hint" | "hintC" | "hintCadet" | "hintSilent") "format" "[" content=STRING "," variable=[Declaration] "]";
Cheers,
Holger

ANTLR4 matching a string with a start cha and end character

I am trying to write antlr grammar so that I can create a match on a certain ID.
I need to match a character that starts with the character 'n' and ends with 'd'
And this ID can have space.
Everywhere else I want to ignore the whitespace
// lexer/terminal rules start with an upper case letter
ID
:
(
'a'..'z'
| 'A'..'Z'
| '0'..'9'
| ('+'|'-'|'*'|'/'|'_')
| '='
| '~'
| '{'
| '}'
| ','
| NA
)+
;
NA : 'n'[ ]['a'..'z']'d' ;
WS : [ \t\n]+ -> skip;
I tested this with an expression A1=not attempted
It considers A1=not as an ID and attempted as an error node
Can you have a grammar that ignore white spaces but makes an exception for a certain string as "not attempted"
You should try to seperate ID ("A1") from the rest. Further you need to take care on the priority of lexical rules. Your "n...d" should have higher priority, so take it as one of your first lexer rules.
A working grammar (only tested for your example "A1=not attempted" is:
statement : ID expr;
expr : OP expr
| (NA | ID | OP)
;
NA : 'n'[a-zA-Z ]*'d' ;
ID
: (
'a'..'z'
| 'A'..'Z'
| '0'..'9'
| ('+'|'-'|'*'|'/'|'_')
)+ ;
OP : '='
| '~'
| '{'
| '}'
| ','
;
WS : [ \t\r\n]+ -> skip;
Try it with start rule statement. I changed the NA Rule so it will match zero or more characters a to z and A to Z and Whitspace in any order.
Good Luck with ANTLR, its a nice tool.