How to simulate inheritance in Clojure? - scala

I recently started looking into Clojure, so admittedly I do not have much
experience nor have I read every book about it. Still, I am having a hard
time figuring out how to expand the behavior of systems coded in Clojure.
To be more specific, for educational purposes, some time ago I implemented
parsers in Scala for a family of small languages -- the NAND-CIRC language
family, similar to how it is defined in “Introduction to Theoretical Computer
Science”, by Boaz Barak. There is a
pure version of the language, and then there are successive syntax sugars that
can be added (inline functions, user-defined functions, if/else, and for).
In Scala -- using the parser combinators library -- I could define the basic
grammar as a class, as shown below (the semantic actions are omitted for
brevity).
import scala.util.parsing.combinator._
class NandCircParsers extends RegexParsers {
def program: Parser[Any] = block
def block: Parser[Any] = rep(command)
def command: Parser[Any] = opt(statement) ~ rep(eol)
def statement: Parser[Any] = assign
def assign: Parser[Any] = reference ~ "=" ~ funcall
def reference: Parser[Any] = outputPort | variable
def funcall: Parser[Any] = identifier ~ "(" ~ actualArgs ~ ")"
def actualArgs: Parser[Any] = repsep(expression, ",")
def expression: Parser[Any] = inputPort | reference
def variable: Parser[Any] = identifier
def inputPort: Parser[Any] = "X" ~ "[" ~ index ~ "]"
def outputPort: Parser[Any] = "Y" ~ "[" ~ index ~ "]"
def identifier: Parser[Any] = """[_a-zA-Z$][_a-zA-Z0-9'$]*""".r
def index: Parser[Any] = number
def number: Parser[Any] = """[-+]?[0-9]+""".r
}
Then I could create an instance of the class NandCircParsers and use it to
process the vanilla version of the language.
But one other dialect of the language NAND-CIRD allows the use of inline
functions. For that, the production rule for expression must change. In Scala
that is a matter of creating a subclass and overriding the method in question.
class NandCircInlineParsers extends NandCircParsers {
override def expression: Parser[Any] = inputPort | reference | funcall
}
And then I can instantiate the class NandCircInlineParsers and use the new
dialect without having to rewrite all the grammar, i.e., the parser hierarchy.
For the dialect with if/else syntax sugar (only) the situation is similar.
class NandCircIfParsers extends NandCircParsers {
override def statement: Parser[Any] = assign | ifSttmt
def ifSttmt: Parser[Any] =
"if" ~ expression ~ ":" ~ block ~ opt("else" ~ ":" ~ block) ~ "end"
}
I just have to override the production rule of the grammar (method) that changed
and add the new ones and I have a parser for the new dialect.
But this is in Scala. Now I am trying to achieve an equivalent result with Clojure.
I have implemented a parser combinators library that goes in the line of what
was done for the parsec.el Emacs Lisp parser combinator
library, and for my needs, it is
working.
The parser for the vanilla version of the NAND-CIRC language becomes
something like this.
(def EQU (token (literal "=")))
(def COMMA (token (literal ",")))
(def LPAR (token (literal "(")))
(def RPAR (token (literal ")")))
(def LBKT (token (literal "[")))
(def RBKT (token (literal "]")))
(def IN (token (literal "X")))
(def OUT (token (literal "Y")))
(def ident (token (regex #"[_a-zA-Z$][_a-zA-Z0-9'$]*")))
(def number (token (regex #"[-+]?[0-9]+")))
(def variable ident)
(def index number)
(def input (do* IN LBKT index RBKT))
(def output (do* OUT LBKT index RBKT))
(def expr (choice input output variable))
(def actual-args (sep-by expr COMMA))
(def funcall (do* ident LPAR actual-args RPAR))
(def formal-args (sep-by var COMMA))
(def reference (either output variable))
(def assign (do* reference EQU funcall))
(def command (do* (optional assign) (many eol)))
(def statement command)
(def program (many command))
Readability issues aside, my question is: how can I achieve the same level of
code reuse that I have in Scala with Clojure? How can I turn this design modular
enough that I could only change or add the necessary rules to get a new dialect
of the language? Right now, all I can think about -- that does not involve
implementing an object system with inheritance -- is to duplicate the entire
grammar.
Does Clojure have any resources that can facilitate this?
Thanks.

This sounds very much like something that could be implemented using maps. The basic grammar NandCircParsers could be
(def NandCircParsers
{:program program
:block block
:command command
:statement statement
... ...})
From this grammar, we can create an extended grammar NandCircIfParsers that uses merge to inherit from NandCircParsers. Something like this, maybe:
(def NandCircIfParsers
(let [ifSttmt ...]
(merge NandCircParsers
{:statement (choice (:assign NandCircParsers) ifSttmt)
:ifSttmt ifSttmt})))

Related

What does some syntax meaning in Scala?

What is the meaning and purpose of the syntactic sugar in this code?
def exp: Parser[Expr] = operands ~ binOp ~ exp ^^ {case e1~o~e2=>BinaryOp(o,e1,e2)}
In particular, what do each of these expressions mean?
operands ~ binOp ~ exp ^^
e1~o~e2
operands ~ binOp ~ exp ^^ ...
Operators in scala are just ordinary method calls:
operands ~ binOp ~ exp ^^ ...
is the same as
operands.~(binOp).~(exp).^^(...)
You can see the documentation for the ~ and ^^ methods here, or you should be able to click through to them in your IDE.
case e1~o~e2
This is matching a case class called ~ - lots of two-parameter things can be written in this "infix notation" in scala. It's equivalent to:
case ~(e1, ~(o, e2))
(see the documentation on case classes)
Those don't have any special meaning - they're just methods that are called ~ and ^^. You'll need to look at the documentation/implementation of whichever library you're using that defined them to figure out what they do.

Parsing a list of 0 or more idents followed by ident

I want to parse a part of my DSL formed like this:
configSignal: sticky Config
Semantically this is:
argument_name: 0_or_more_modifiers argument_type
I tried implementing the following parser:
def parser = ident ~ ":" ~ rep(ident) ~ ident ^^ {
case name ~ ":" ~ modifiers ~ returnType => Arg(name, returnType, modifiers)
}
Thing is, the rep(ident) part is applied until there are no more tokens and the parser fails, because the last ~ ident doesn't match. How should I do this properly?
Edit
In the meantime I realized, that the modifiers will be reserved words (keywords), so now I have:
def parser = ident ~ ":" ~ rep(modifier) ~ ident ^^ {
case name ~ ":" ~ modifiers ~ returnType => Arg(name, returnType, modifiers)
}
def modifier = "sticky" | "control" | "count"
Nevertheless, I'm curious if it would be possible to write a parser if the modifiers weren't defined up front.
"0 or more idents followed by ident" is equivalent to "1 or more idents", so just use rep1
Its docs:
def rep1[T](p: ⇒ Parser[T]): Parser[List[T]]
A parser generator for non-empty repetitions.
rep1(p) repeatedly uses p to parse the input until p fails -- p must succeed at least once (the result is a List of the consecutive results of p)
p a Parser that is to be applied successively to the input
returns A parser that returns a list of results produced by repeatedly applying p to the input (and that only succeeds if p matches at least once).
edit in response to OP's comment:
I don't think there's a built-in way to do what you described, but it would still be relatively easy to map to your custom data types by using regular List methods:
def parser = ident ~ ":" ~ rep1(ident) ^^ {
case name ~ ":" ~ idents => Arg(name, idents.last, idents.dropRight(1))
}
In this particular case, you wouldn't have to worry about idents being Nil, since the rep1 parser only succeeds with a non-empty list.

Parser Combinators - Ordered Choice and Left-Recursion

What is the significance of ordered choice? Does it simply mean that you put the longest pattern match first?
Let's say you had this expression"
val expr = "eat" ~ "more" ~ "beans" |
"eat" ~ "more" ~ "beans" ~ "and" ~ "fruit"
Since parser combinators use Ordered Choice, the string eat more beans and soup ... would result in matching on the first line? The val expr uses Ordered Choice poorly since it includes a less-specific expression first?
Also, what is left recursion?
Scala parser combinators implements parsing expression grammers. A PEG is predicated on the availability of infinite lookahead and backtracking capabilities which makes it easier to express grammars as it is not necessary to make a unilateral decision at any point in the parsing process.
Ordered choice/alternation can be considered the primary enabler of this behavior; a production under which a sequence of productions are tried in sequence, accepting the first one which matches the input. In your example above the second choice will never be matched because any input matching the second choice would be accepted by the first choice.
Left recursion occurs in the event that given a production of the form a = b, an expansion of b begins with a. Consider:
def a = b ~ c
def b = a ~ c
Expansion (matching) of the production a proceeds as follows:
b ~ c
(a ~ c) ~ c // substituting b
((b ~ c) ~ c) ~ c // substituting a
(((a ~ c) ~ c) ~ c) ~ c // substituting b
This is effectively infinite, unterminated recursion.

parsing recursive structures in scala

I'm trying to contruct a parser in scala which can parse simple SQL-like strings. I've got the basics working and can parse something like:
select id from users where name = "peter" and age = 30 order by lastname
But now I wondered how to parse nested and classes, i.e.
select name from users where name = "peter" and (age = 29 or age = 30)
The current production of my 'combinedPredicate' looks like this:
def combinedPredicate = predicate ~ ("and"|"or") ~ predicate ^^ {
case l ~ "and" ~ r => And(l,r)
case l ~ "or" ~ r => Or(l,r)
}
I tried recursively referencing the combinedPredicate production within itself but that results in a stackoverflow.
btw, I'm just experimenting here... not implementing the entire ansi-99 spec ;)
Well, recursion has to be delimited somehow. In this case, you could do this:
def combinedPredicate = predicate ~ rep( ("and" | "or" ) ~ predicate )
def predicate = "(" ~ combinedPredicate ~ ")" | simplePredicate
def simplePredicate = ...
So it will never stack overflow because, to recurse, it first has to accept a character. This is the important part -- if you always ensure recursion won't happen without first accepting a character, you'll never get into an infinite recursion. Unless, of course, you have infinite input. :-)
The stack overflow you're experiencing is probably the result of a left-recursive language:
def combinedPredicate = predicate ~ ...
def predicate = combinedPrediacate | ...
The parser combinators in Scala 2.7 are recursive descent parsers. Recursive descent parsers have problems with these because they have no idea how the terminal symbol is when they first encounter it. Other kinds of parsers like Scala 2.8's packrat parser combinators will have no problem with this, though you'll need to define the grammar using lazy vals instead of methods, like so:
lazy val combinedPredicate = predicate ~ ...
lazy val predicate = combinedPrediacate | ...
Alternatively, you could refactor the language to avoid left recursion. From the example you're giving me, requiring parentheses in this language could solve the problem effectively.
def combinedPredicate = predicate ~ ...
def predicate = "(" ~> combinedPrediacate <~ ")" | ...
Now each deeper level of recursion corresponds to another parentheses parsed. You know you don't have to recurse deeper when you run out of parentheses.
After reading about solutions for operator precedence and came up with the following:
def clause:Parser[Clause] = (predicate|parens) * (
"and" ^^^ { (a:Clause, b:Clause) => And(a,b) } |
"or" ^^^ { (a:Clause, b:Clause) => Or(a,b) } )
def parens:Parser[Clause] = "(" ~> clause <~ ")"
Wich is probably just another way writing what #Daniel wrote ;)

Scala combinator parsers - distinguish between number strings and variable strings

I'm doing Cay Horstmann's combinator parser exercises, I wonder about the best way to distinguish between strings that represent numbers and strings that represent variables in a match statement:
def factor: Parser[ExprTree] = (wholeNumber | "(" ~ expr ~ ")" | ident) ^^ {
case a: wholeNumber => Number(a.toInt)
case a: String => Variable(a)
}
The second line there, "case a: wholeNumber" is not legal. I thought about a regexp, but haven't found a way to get it to work with "case".
I would split it up a bit and push the case analysis into the |. This is one of the advantages of combinators and really LL(*) parsing in general:
def factor: Parser[ExprTree] = ( wholeNumber ^^ { Number(_.toInt) }
| "(" ~> expr <~ ")"
| ident ^^ { Variable(_) } )
I apologize if you're not familiar with the underscore syntax. Basically it just means "substitute the nth parameter to the enclosing function value". Thus { Variable(_) } is equivalent to { x => Variable(x) }.
Another bit of syntax magic here is the ~> and <~ operators in place of ~. These operators mean that the parsing of that term should include the syntax of both the parens, but the result should be solely determined by the result of expr. Thus, the "(" ~> expr <~ ")" matches exactly the same thing as "(" ~ expr ~ ")", but it doesn't require the extra case analysis to retrieve the inner result value from expr.