I am currently working on creating a DSL in xText and I am stumbling upon a problem which is probably related to an ambiguity or left recursion problem. I am not sure which of these two problems applies to my case (similar topics I found online also mention that these problems often seem related) but I guess it has to do with left recursion.
On line 100 in my DSL code I declare a rule called Expression. As you can see it aggregates multiple other types (which on their part again aggregate multiple other types and eventually types called Factor (on line 130) can also be aggregated). Eventually this whole 'aggregation tree' boils down to a problem with this Factor type. As you can see, this type can aggregate an Expression again. So there is a loop; an Expression type can eventually contain a Factor type, and the Factor type can then again contain an Expression type (after which this loop can theoretically continue infinitely; I guess that's where the problem is because the ANTLR parser used by xText was not designed for this kind of recursion). I tried to solve this problem by using a syntactic predicate (=> symbol) in the Expression type (see
(=> "endSimpleExpression")
but it's still not working. I know for sure that it has to do with the relationship between the types Expressions and Factor (because if I don't add Expression types in the Factor type, the DSL works just fine). I assume that I am not placing the syntactic predicate on the right place. Another solution that I considered was the use of left Factoring, but I don't know how to apply left factoring in this case. I am curious to your thoughts on this problem.
grammar org.xtext.example.mydsl.FinalDsl with org.eclipse.xtext.common.Terminals
generate finalDsl "http://www.xtext.org/example/mydsl/FinalDsl"
'functionName' name = STRING
functions += FunctionElements*
// Function elements of which the model exists. The model can contain
// library functions, for loops, and if/else statements.
ifElseStatements += IfElseStatements |
statements += Statement
// IfElse Statements requiring if statements and optionally followed by
// one else statement.
ifStatements += IfStatements
(elseStatement = ElseStatement)?
// If statements requiring conditions and optionally followed by
// library functions or for loops.
expression = Expression
(ifFunctions += libraryFunctionsEnum | forLoops += ForLoops)
// Else statement requiring one or multiple library functions.
'else' elseFunctions += libraryFunctionsEnum
// For loops requiring one condition and followed by zero or more
// library functions
expressions = Expression
libraryFunctions += libraryFunctionsEnum*
//compoundStatement += CompoundStatement | //left out of Statement because
// otherwise a recursive call exists (statement += compoundstatement += statement
simpleStatement += SimpleStatement |
structuredStatement += StructuredStatement
classOperationStatement += ClassOperationStatement |
libraryInterFaceMethodStatement += LibraryInterFaceMethodStatement |
libraryPersistenceMethodStatement += LibraryPersistenceMethodStatement
forLoops += ForLoops | ifElseStatements += IfElseStatements
classOperationName += libraryFunctionsEnum
interfaceMethods += libraryInterFaceMethodStatementEnum
persistenceMethods += libraryPersistenceMethodStatementEnum
//*Eventually filled with details from class diagram, but for now we manually fill it for the sake of testing.
enum libraryFunctionsEnum:
hasCode= 'encrypt'|
enum libraryPersistenceMethodStatementEnum:
createInstance = "createInstance" |
log = "log"
enum libraryInterFaceMethodStatementEnum:
mesasge = "message" |
error = "error"
simpleExpression = SimpleExpression
(relationalOperator = RelationalOperator
additionalSimpleExpression = SimpleExpression)?
(=> "endSimpleExpression")
term = Term
additionalExpressions += AdditionalExpressions*
additionOperator = AdditionOperator
term = Term
factorTerm = Factor
additionalTerm += AdditionalTerm*
multiplicationOperator = MultiplicationOperator
factor = Factor
// We can optionally integrate Java types right here (int, boolean, string, etc.)
Factor: {Factor} (
"(" expression = Expression ")" |
//'not' factor += Factor |
operationParameterName = OperationParameterName |
classAttributeName += ClassAttributeName |
INT //| STRING //| set = Set
OperationParameterName: // We can use identifiers right here, but for now I put in a string
'operationParameter' STRING
ClassAttributeName: // We can use identifiers right here, but for now I put in a string
"=" | "<>" | "<" | "<=" | ">" | ">=" | "in"
"+" | "-" | "or"
"*" | "/" | "and"
enum logicalOperators:
InternalFinalDsl.g:139:2: [fatal] rule ruleFunctionElements has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2.
So let's look at the rule FunctionElements:
ifElseStatements += IfElseStatements |
statements += Statement
Okay, so FunctionElements can either be an IfElseStatement or a Statement. That sounds suspicious and sure enough: a Statement can be a StructuredStatement, which can in turn be an IfElseStatement. So the above is ambiguous because an IfElseStatement could either be derived directly via FunctionElements -> IfElseStatement or indirectly via FunctionElements -> Statement -> StructuredStatement -> IfElseStatement.
So you should simply remove the IfElseStatement alternative because it is redundant.
All these following lines of code are Julia expressions:
x = 10
1 + 1
if you want to pass an expression to a macro, it works like this. Macro foo just returns the given expression, which will be executed:
macro foo(ex)
return ex
#foo println("yes") # prints yes
x = #foo 1+1
println(x) # prints 2
If you want to convert a string into an expression, you can use Meta.parse():
string = "1+1"
expr = Meta.parse(string)
x = #foo expr
println(x) # prints 1 + 1
But, obviously, the macro treats expr as a symbol. What am i getting wrong here?
Thanks in advance!
Macro hygiene is important "macros must ensure that the variables they introduce in their returned expressions do not accidentally clash with existing variables in the surrounding code they expand into." There is a section in the docs. It is easiest just to show a simple case:
macro foo(x)
return :($x)
When you enter an ordinary expression in the REPL, it is evaluated immediately. To suppress that evaluation, surround the expression with :( ).
julia> 1 + 1
julia> :(1 + 1)
:(1 + 1)
# note this is the same result as you get using Meta.parse
julia> Meta.parse("1 + 1")
:(1 + 1)
So, Meta.parse will convert an appropriate string to an expression. And if you eval the result, the expression will be evaluated. Note that printing a simple expression removes the outer :( )
julia> expr = Meta.parse("1 + 1")
:(1 + 1)
julia> print(expr)
1 + 1
julia> result = eval(expr)
Usually, macros are used to manipulate things before the usual evaluation of expressions; they are syntax transformations, mostly. Macros are performed before other source code is compiled/evaluated/executed.
Rather than seeking a macro that evaluates a string as if it were typed directly into the REPL (without quotes), use this function instead.
evalstr(x::AbstractString) = eval(Meta.parse(x))
While I do not recommend this next macro, it is good to know the technique.
A macro named <name>_str is used like this <name>"<string contents>" :
julia> macro eval_str(x)
julia> eval"1 + 1"
(p.s. do not reuse Base function names as variable names, use str not string)
Please let me know if there is something I have not addressed.
Scala's parser combinators allow to specify some repetitions as 0 or more (.*), 1 or more (.+), etc. Is it possible to specify a range? For example, rep(p, q)(parser) where parser is run at least p times and up to q times.
The Combinator Operations in Scala's Parser Combinators are roughly modeled after Regular Expressions. Regular Expressions only have three Combinator Operations:
Given two Regular Expressions R and S,
RS is a Regular Expression (Concatenation)
R | S is a Regular Expression (Alternation)
R* is a Regular Expression (Kleene Star)
That's it. Scala adds a couple more, most notably + and ? but that doesn't actually increase the power since R+ is actually just RR* and R? is just R | ε. The same applies to your proposed rep combinator. It does not actually increase the power since
R{m, n}
is actually just
↑↑ m×R ↑↑ ↑↑ (m-n)×R ↑↑
Now, of course, while this doesn't increase the power of the Parsers, it does increase the expressivity and thus readability and maintainability.
It would be pretty easy to build it on top of | and repN, I believe.
Something like:
/** A parser generator for a number of repetitions within a range.
* `repMN(m, n, p)` uses `p` between `m` and `n` time to parse the input
* (the result is a `List` of the consecutive results of `p`).
* #param p a `Parser` that is to be applied successively to the input
* #param min the minimum number of times `p` must succeed
* #param max the maximum number of times `p` must succeed
* #return A parser that returns a list of results produced by repeatedly applying `p` to the input
* (and that only succeeds if `p` matches between `m` and `n` times).
def repMN[T](min: Int, max: Int, p: ⇒ Parser[T]) =
(min to max).reverse map { repN(_, p) } reduce {_ | _}
This looks useful enough that it might even make sense to file as an enhancement request.
I am working on a language in F# and upon testing, I find that the runtime spends over 90% of its time comparing for equality. Because of that the language is so slow as to be unusable. During instrumentation, the GetHashCode function shows fairly high up on the list as a source of overhead. What is going on is that during method calls, I am using method bodies (Expr) along with the call arguments as keys in a dictionary and that triggers repeated traversals over the AST segments.
To improve performance I'd like to add memoization nodes in the AST.
type Expr =
| Add of Expr * Expr
| Lit of int
| HashNode of int * Expr
In the above simplified example, what I would like is that the HashNode represent the hash of its Expr, so that the GetHashCode does not have to travel any deeper in the AST in order to calculate it.
That having said, I am not sure how I should override the GetHashCode method. Ideally, I'll like to reuse the inbuilt hash method and make it ignore only the HashNode somehow, but I am not sure how to do that.
More likely, I am going to have to make my own hash function, but unfortunately I know nothing about hash functions so I am a bit lost right now.
An alternative idea that I have would be to replace nodes with unique IDs while keeping that hash function as it is, but that would introduce additional complexities into the code that I'd rather avoid unless I have to.
I needed a similar thing recently in TheGamma (GitHub) where I build a dependency graph (kind of like AST) that gets recreated very often (when you change code in editor and it gets re-parsed), but I have live previews that may take some time to calculate, so I wanted to reuse as much of the previous graph as possible.
The way I'm doing that is that I attach a "symbol" to each node. Two nodes with the same symbol are equal, which I think you could use for efficient equality testing:
type Expr =
| Add of ExprNode * ExprNode
| Lit of int
and ExprNode(expr:Expr, symbol:int) =
member x.Expression = expr
member x.Symbol = symbol
override x.GetHashCode() = symbol
override x.Equals(y) =
match y with
| :? ExprNode as y -> y.Symbol = x.Symbol
| _ -> false
I do keep a cache of nodes - the key is some code of the node kind (0 for Add, 1 for Lit, etc.) and symbols of all nested nodes. For literals, I also add the number itself, which will mean that creating the same literal twice will give you the same node. So creating a node looks like this:
let node expr ctx =
// Get the key from the kind of the expression
// and symbols of all nested node in this expression
let key =
match expr with
| Lit n -> [0; n]
| Add(e1, e2) -> [1; e1.Symbol; e2.Symbol]
// Return either a node from cache or create a new one
match ListDictionary.tryFind key ctx with
| Some res -> res
| None ->
let res = ExprNode(expr, nextId())
ListDictionary.set key res ctx
The ListDictionary module is a mutable dictionary where the key is a list of integers and nextId is the usual function to generate next ID:
type ListDictionaryNode<'K, 'T> =
{ mutable Result : 'T option
Nested : Dictionary<'K, ListDictionaryNode<'K, 'T>> }
type ListDictionary<'K, 'V> = Dictionary<'K, ListDictionaryNode<'K, 'V>>
module ListDictionary =
let tryFind ks dict =
let rec loop ks node =
match ks, node with
| [], { Result = Some r } -> Some r
| k::ks, { Nested = d } when d.ContainsKey k -> loop ks (d.[k])
| _ -> None
loop ks { Nested = dict; Result = None }
let set ks v dict =
let rec loop ks (dict:ListDictionary<_, _>) =
match ks with
| [] -> failwith "Empty key not supported"
| k::ks ->
if not (dict.ContainsKey k) then
dict.[k] <- { Nested = Dictionary<_, _>(); Result = None }
if List.isEmpty ks then dict.[k].Result <- Some v
else loop ks (dict.[k].Nested)
loop ks dict
let nextId =
let mutable id = 0
fun () -> id <- id + 1; id
So, I guess I'm saying that you'll need to implement your own caching mechanism, but this worked quite well for me and may hint at how to do this in your case!
The "Programming in scala" introduces the rules of semicolon inference:
In short, a line ending is treated as a semicolon unless one of the following conditions is true:
The line in question ends in a word that would not be legal as the end of a statement, such as a period or an infix operator.
The next line begins with a word that cannot start a statement.
The line ends while inside parentheses(...) or brackets[...], because these cannot contain multiple statements anyway.
But I can't find an example that in the second condition,who can give an example?
I have tried the following code because * cannot start a statement,but it failed!
1 * 2
But I can't find an example that in the second condition,who can give an example?
According to the SLS:
The tokens that can begin a statement are all Scala tokens except the following delimiters and reserved words:
So, one example could be:
return 42
This is equivalent to
return 42.toString(); // returns the `String` "42"
and not
return 42; // returns the `Int` 42
.toString() // dead code
I have tried the following code because * cannot start a statement,but it failed!
1 * 2
What makes you think that * cannot start a statement? Please, re-read the spec carefully. A method call is perfectly legal starting a statement:
is valid, and so is
Ergo, * can start a statement. Full example:
object Test
def test = {
1 * 2
def *(x: Int) = {
x + 1
// 3
//=> res0: Int = 4
On compiling the following code with Scala 2.7.3,
package spoj
object Prime1 {
def main(args: Array[String]) {
def isPrime(n: Int) = (n != 1) && (2 to n/2 forall (n % _ != 0))
val read = new java.util.Scanner(System.in)
var nTests = read nextInt // [*]
while(nTests > 0) {
val (start, end) = (read nextInt, read nextInt)
start to end filter(isPrime(_)) foreach println
nTests -= 1
I get the following compile time error :
PRIME1.scala:8: error: illegal start of simple expression
while(nTests > 0) {
PRIME1.scala:14: error: block must end in result expression, not in definition
two errors found
When I add a semicolon at the end of the line commented as [*], the program compiles fine. Can anyone please explain why does Scala's semicolon inference fail to work on that particular line?
Is it because scala is assuming that you are using the syntax a foo b (equivalent to a.foo(b)) in your call to readInt. That is, it assumes that the while loop is the argument to readInt (recall that every expression has a type) and hence the last statement is a declaration:
var ntests = read nextInt x
wherex is your while block.
I must say that, as a point of preference, I've now returned to using the usual a.foo(b) syntax over a foo b unless specifically working with a DSL which was designed with that use in mind (like actors' a ! b). It makes things much clearer in general and you don't get bitten by weird stuff like this!
Additional comment to the answer by oxbow_lakes...
var ntests = read nextInt()
Should fix things for you as an alternative to the semicolon
To add a little more about the semicolon inference, Scala actually does this in two stages. First it infers a special token called nl by the language spec. The parser allows nl to be used as a statement separator, as well as semicolons. However, nl is also permitted in a few other places by the grammar. In particular, a single nl is allowed after infix operators when the first token on the next line can start an expression -- and while can start an expression, which is why it interprets it that way. Unfortunately, although while can start a expression, a while statement cannot be used in an infix expression, hence the error. Personally, it seems a rather quirky way for the parser to work, but there's quite plausibly a sane rationale behind it for all I know!
As yet another option to the others suggested, putting a blank newline between your [*] line and the while line will also fix the problem, because only a single nl is permitted after infix operators, so multiple nls forces a different interpretation by the parser.