Given the following grammar:
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Program:
{Range} ID '.' '.' ID
| {Group} ID ID ID ID
;
terminal ID:
'a' | '.'
;
and the following input:
a . . a
I would argue that there are two ways in which the string can be parsed: as a Range (the first alternative) or as a Group (the second alternative). When I try this in my generated IDE and inspect the Ecore model, a Range is instantiated.
What makes Xtext decide in favor of the Range?
Edit: specifically, I'm wondering why the Xtext grammar itself is not ambiguous, since a range 'a'..'z' can be parsed as either a Group of Keyword, Wildcard, Wildcard, Keyword or as a CharacterRange of Keyword, Keyword.
Keywords become Lexer Rules a well. Thus you have two Lexer Rules
terminal FULL_STOP_KEYWORD: '.' ;
and
terminal ID: 'a' | '.';
The Lexer is not stateful. Only one rule can win. Thus '.' will always be lexed as Keyword and never as ID
Related
I am using the grammar file at https://github.com/antlr/grammars-v4/blob/master/sql/tsql/TSqlParser.g4. It has a built_in_functions grammar rule. I want to parse a new function, DAYZ, as a built-in function. I introduced it thus in the .g4
built_in_functions
// https://msdn.microsoft.com/en-us/library/ms173784.aspx
: BINARY_CHECKSUM '(' '*' ')' #BINARY_CHECKSUM
// https://msdn.microsoft.com/en-us/library/ms186819.aspx
| DATEADD '(' datepart=ID ',' number=expression ',' date=expression ')' #DATEADD
| DAYZ '(' date=expression ')' #DAYZ
When I use grun to test the grammar, I get unexpected results for DAYZ. For a DATEDIFF I get what I expect.
For DAYZ, I get the following tree
Why does the parser not treat DAYZ as satisfying the rule built_in_functions like it does for DATEDIFF ? If the parser recognizes DAYZ eventually as an _Id, it should do the same for DATEDIFF. There must be something wrong in the way I am introducing DAYZ into the grammar but I can't figure it out. Any help appreciated. And apologies if I am not using the correct ANTLR terminology. I am a newbie to ANTLR.
I am using antlr-4.9.2-complete.jar
Move your lexer rule for DAYZ to appear before the ID rule in the TSqlLexer.g4 file.
since the id_ rule recognizing the token, then it must be being tokenized as an ID token. This will happen if you DAYZ rule definition is after the ID rule definition.
When ANTLR finds two lexer rules that match the same string of input characters (i.e. "DAYZ"), then it will use whichever rule appears first in the grammar.
I'm trying to understand this snippet code from:
https://code.kx.com/q/kb/loading-from-large-files/
to customize it by myself (e.x partition by hours, minutes, number of ticks,...):
$ cat fs.q
\d .Q
/ extension of .Q.dpft to separate table name & data
/ and allow append or overwrite
/ pass table data in t, table name in n, : or , in g
k)dpfgnt:{[d;p;f;g;n;t]if[~&/qm'r:+en[d]t;'`unmappable];
{[d;g;t;i;x]#[d;x;g;t[x]i]}[d:par[d;p;n];g;r;<r f]'!r;
#[;f;`p#]#[d;`.d;:;f,r#&~f=r:!r];n}
/ generalization of .Q.dpfnt to auto-partition and save a multi-partition table
/ pass table data in t, table name in n, name of column to partition on in c
k)dcfgnt:{[d;c;f;g;n;t]*p dpfgnt[d;;f;g;n]'?[t;;0b;()]',:'(=;c;)'p:?[;();();c]?[t;();1b;(,c)!,c]}
\d .
r:flip`date`open`high`low`close`volume`sym!("DFFFFIS";",")0:
w:.Q.dcfgnt[`:db;`date;`sym;,;`stats]
.Q.fs[w r#]`:file.csv
But I couldn't find any resources to give me detail explain. For example:
if[~&/qm'r:+en[d]t;'`unmappable];
what does it do with the parameter d?
(Promoting this to an answer as I believe it helps answer the question).
Following on from the comment chain: in order to translate the k code into q code (or simply to understand the k code) you have a few options, none of which are particularly well documented as it defeats the purpose of the q language - to be the wrapper which obscures the k language.
Option 1 is to inspect the built-in functions in the .q namespace
q).q
| ::
neg | -:
not | ~:
null | ^:
string | $:
reciprocal| %:
floor | _:
...
Option 2 is to inspect the q.k script which creates the above namespace (be careful not to edit/change this):
vi $QHOME/q.k
Option 3 is to lookup some of the nuggets of documentation on the code.kx website, for example https://code.kx.com/q/wp/parse-trees/#k4-q-and-qk and https://code.kx.com/q/basics/exposed-infrastructure/#unary-forms
Options 4 is to google search for reference material for other/similar versions of k, for example k2/k3. They tend to be similar-ish.
Final point to note is that in most of these example you'll see a colon (:) after the primitives....this colon is required in q/kdb to use the monadic form of the primitive (most are heavily overloaded) while in k it is not required to explicitly force the monadic form. This is why where will show as &: in the q reference but will usually just be & in actual k code
I am replacing our logging functionality and it is taking a long time to manually go through all of the code and replace it.
Here is the current code:
Error Messages:
cLogger.LogMessage(ComponentID.ClientID, CLASS_NAME, "AddContextMenuItem", MessageType.mtErrorMessage, "Null MenuItem provided. MenuItem's status not changed");
cLogger.LogMessage(ComponentID.ClientID, CLASS_NAME, "enableDisableToolbarItem", MessageType.mtErrorMessage, "Invalid toolbaritem provided.");
Exceptions:
cLogger.LogMessage(ComponentID.ClientID, CLASS_NAME, "enableDisableContextMenuItem", MessageType.mtException, ex);
cLogger.LogMessage(ComponentID.ClientID, CLASS_NAME, "AddToolbarItem", MessageType.mtException, exc);
Is there a simple way to create a macro (never used a macro before) or power shell or notepad++ script or something else to find and replace all of these different instances so that they look like the following:
New Error Messages:
logger.Log(LogLevel.Error, CLASS_NAME + " AddContextMenuItem - Null MenuItem provided. MenuItem's status not changed");
logger.Log(LogLevel.Error, CLASS_NAME + " enableDisableToolbarItem - Invalid toolbaritem provided.");
and
New Exceptions:
logger.Log(LogLevel.Exception, CLASS_NAME + " enableDisableContextMenuItem - " + ex);
logger.Log(LogLevel.Exception, CLASS_NAME + " AddToolbarItem - " + exc);
I am replacing the code in the entire project and it will just simply take way too long to go through and manually change all of the logging code manually. Any help is greatly appreciated.
There are a few options:
Regex Search & Replace in Visual Studio:
search for the exception example
\w+logger.LogMessage\([^,]+,([^,]+),([^,]+),[^,]+,([^\",]+)\);
replace
logger.Log(LogLevel.Exception, $1 + $2 + $3);
Use Resharper structural Search & Replace
Build a CodeFix for Roslyn
Yes, you can likely do this with a Regular Expression, easier in PowerShell perhaps than in Notepad++ or perhaps VSCode.
It's difficult to tell from your examples precisely what you are changing in each item, but the basic concept is to do the following:
Match the static text that establishes the type of item to change
Also match the variable text with wildcards (.* etc) enclosed in CAPTURING parentheses
Replace with new static text and 'rearranged' variable text using the $1, $2, etc backreferences to the capture groups (or $Matches[1] etc.)
If #3 is more complicated, you'll need to further alter the variable text before replacing -- this is where a script language has an advantage over a pure search and replace.
Here is a simplified example (PowerShell but similar in other langauges or editors that support Regexes) for statically replacing the "FunctionOldName" while swapping the order of Param1 and Param2 and altering the names based on the original names for these params:
"Function_OldName Param1 Param2" -replace 'Function_OldName\s+(\w+)\s+(\w+)',
'NewFunctionName New$2Parm New$1Parm'
The $1 and $2 are backreferences to the "1st parens" and "2nd parens" respectively in the match string.
If you can write out clear examples showing which parts of your changed text must be matched, simply altered, rearranged, or rebuilt then it might be possible to show you some more relevant examples....
You can do this across many files with either PowerShell or the editors, but generally doing it to many files is again a bit easier in a Programming language (e.g., PowerShell.)
Get-ChildItem *.PS1 -recurse | ForEach-Object {
'Function_OldName\s+(\w+)\s+(\w+)', # your match goes here
'NewFunctionName New$2Parm New$1Parm' # your replacement goes here
}
I was reading a SIP (Scala Improvement Process) document and found this syntax:
We introduce a new form of expression for processed strings: Syntax:
SimpleExpr1 ::= … | processedStringLiteral
processedStringLiteral
::= alphaid`"' {printableChar \ (`"' | `$') | escape} `"'
| alphaid `"""' {[`"'] [`"'] char \ (`"' | `$') | escape} {`"'} `"""'
escape ::= `$$'
| `$' letter { letter | digit }
| `$'BlockExpr
alphaid ::= upper idrest
| varid
I would like to be able to understand this syntax but I don't even know:
What it's called? (if it's called anything)
If it is specific to SIP's
Everything that I think I know are assumptions from other programming languages or specifications, like:
| denotes an alternative unless used at start of line, then it just says the line continues.
\ is an escape character
This notation starts by defining a concept at the cost of other concepts, i.e. processedStringLiteral is defined at the cost of alphaid, escape and printableChar (even tho I have no idea where printableChar is).
The questions:
Are my assumptions correct?
What about the remaining notation like ::=, "'.
How would I read this as if I was reading english? I.e: "A processed string literal starts with a letter followed by a space... " (assuming I can even read it like this).
Summary:
This notation is called Extended Backus-Naur Form.
It is not specific to SIPs.
Your assumptions are partially correct.
I'll explain what those symbols mean in the longer version
I'll give examples of English translations in the longer version
Yes this is a fragment. Not all definitions are present.
Longer version:
What you are seeing is as, #pedrofurla points out, Extended Backus-Naur Form, which is unfortunately not well defined. This link lists many different variants of it that you might find in the wild. Like pseudo-code you'll come to see a lot of conventions that appear over and over again and therefore in most practical cases it is unambiguous what the EBNF means. It is used to specify a certain grammar*, i.e. a "valid" subset for the task at hand of all strings (e.g. syntactically correct code in a given language). It is not specific to SIPs.
It is generally as (with an exception in this particular variant being used) an additive specification. Each line is a new rule that adds a new kind of valid string to the valid subset we are defining of all strings.
What I describe next will be the particular variant used here, but most other variants are similar with minor syntactic differences or renamings.
Every rule (often called a production rule) consists of two parts: a variable name (usually called a nonterminal symbol) on the left hand side followed by ::= which you can read as "is defined as" and a series of characters that then define the variable.
In this particular case things quoted by ` and ' are constants (usually called a terminal symbol), that is atomic strings that are always considered valid. All nonquoted names are variables (again nonterminal symbols) that refer to a string deemed valid by the rule that defined that variable.
| is indeed meant to be read as "or."
\ is the exception to the additive nature of this notation. It is meant to be read as "except for." It is the same symbol that is used in mathematics to denote set difference (subtracting the elements of one set from another).
{...} is read as "0 or more of these."
[...] is read as "0 or 1 of these."
(...) is traditional grouping/association like you might find in any programming language.
Finally (just a space) is used for concatenation.
Let's put it all together for some basic examples!
trivialidentifier ::= `this' | `that'
In English: "The set of strings I consider valid are all strings that are trivialidentifiers. trivialidentifiers are 'this' or 'that'." Hence the only strings considered valid here are "this" and "that".
Let's try something more:
name ::= `John' | `Mary' | `Jane'
verb ::= `runs' | `walks'
sentence ::= (name \ `Mary') ` ' verb
In English: "Here are the valid strings we care about: A name is 'John', 'Mary', or 'Jane'. A verb is 'runs' or 'walks'. A sentence is any name except for 'Mary' followed by a space and any verb." So for example "John runs" is a valid sentence but "Mary runs" is not.
And now for something recursive:
thing ::= `a' | { thing }
In English: "Here are our valid strings we care about. A thing is either 'a' or zero or more repetitions of thing." In other words any repetition of "a", such as "", "a", "aa", "aaa", etc.
Note that the above is equivalent to
thing ::= ` ' | `a' | `a' [ ( thing \ ` ' ) ]
Now let's turn back to the SIP and just translate the processedStringLiteral production rule.
A processedStringLiteral is an alphaid followed by a quote followed by one or more printableChars (except for quote or the dollar sign) or escapes (with possible intermingling of the two) ending in another quote.
Alternatively it is an alphaid followed by three quotes followed by one or more of the following: up to two consecutive quotes followed by any char except another quote or dollar sign or an escape. You can then add any number of quotes followed by a final three quotes.
* EBNF is not powerful enough to describe all grammars. It only describes grammars known as context free grammars.
I started playing around with xtext a few days ago and just went through the tutorials. Maybe the solution has been covered in the reference somewhere but I cannot get it right quickly.
My problem is this. I tried to write a simple grammar which mixed in org.eclipse.xtext.common.Terminals . Then I wanted to insert a cusotm terminal FILE_NAME like this:
terminal FILE_NAME:
( !('/' | '\\' | ':' | '*' | '?' | '"' | '<' | '>' | '|') )+
;
That's basically what a filename is allowed to be under Windows. However, by doing that, inherited rules like ID, INT, etc. would never be matched, because they are always generated after custom terminals.
Can that kind of problem be avoided gracefully (as repeatless as possible and as general as possible)? Thanks in advance!
Terminal rules (aka lexer rules) are used to tokenize the input sequenze. IMHO there should be a minimum of semantics in terminal rules.
You try to express a specialized parser rule which accepts only valid file names.
Have a look at parser phases described in the Xtext Documentation [1]. My suggestion:
Lexing: Instead of using a specialized terminal rule go with STRING.
Validation: Write a validation rule for an EClass with a 'fileName' EAttribute.
as repeatless as possible and as general as possible
You don't want to repeat your validation for every EClass with a 'fileName' EAttribute. Introduce a new super type with a 'fileName' EAttribute if you have a refined Ecore model.
Than you can implement one general validation rule #check_fileName_is_valid(ElementWithFile).
And if you don't have a refined MM use meta model hints within your grammar. If you provide a generalized super type Xtext's Ecore inferrer will pull up common features of the subtypes. Ex:
ElementWithFile: A | B;
A: ... 'file' fileName=STRING ...;
B: ... 'file' fileName=STRING ...;
// => Ecore: ElementWithFile.fileName<EString>
[1] http://www.eclipse.org/Xtext/documentation.html#DSL