Parser combinator grammar not yielding correct associativity - scala

I am working on a simple expression parser, however given the following parser combinator declarations below, I can't seem to pass my tests and a right associative tree keeps on popping up.
def EXPR:Parser[E] = FACTOR ~ rep(SUM|MINUS) ^^ {case a~b => (a /: b)((acc,f) => f(acc))}
def SUM:Parser[E => E] = "+" ~ EXPR ^^ {case "+" ~ b => Sum(_, b)}
def MINUS:Parser[E => E] = "-" ~ EXPR ^^ {case "-" ~ b => Diff(_, b)}
I've been debugging hours for this. I hope someone can help me figure it out it's not coming out right.
"5-4-3" would yield a tree that evaluates to 4 instead of the expected -2.
What is wrong with the grammar above?

I don't work with Scala but do work with F# parser combinators and also needed associativity with infix operators. While I am sure you can do 5-4 or 2+3, the problem comes in with a sequence of two or more such operators of the same precedence and operator, i.e. 5-4-2 or 2+3+5. The problem won't show up with addition as (2+3)+5 = 2+(3+5) but (5-4)-2 <> 5-(4-2) as you know.
See: Monadic Parser Combinators 4.3 Repetition with meaningful separators. Note: The separators are the operators such as "+" and "*" and not whitespace or commas.
See: Functional Parsers Look for the chainl and chainr parsers in section 7. More parser combinators.
For example, an arithmetical expressions, where the operators that
separate the subexpressions have to be part of the parse tree. For
this case we will develop the functions chainr and chainl. These
functions expect that the parser for the separators yields a function
(!);
The function f should operate on an element and a list of tuples, each
containing an operator and an element. For example, f(e0; [(1; e1);
(2; e2); (3; e3)]) should return ((eo 1 e1) 2 e2) 3 e3. You may
recognize a version of foldl in this (albeit an uncurried one), where
a tuple (; y) from the list and intermediate result x are combined
applying x y.
You need a fold function in the semantic parser, i.e. the part that converts the tokens from the syntactic parser into the output of the parser. In your code I believe it is this part.
{case a~b => (a /: b)((acc,f) => f(acc))}
Sorry I can't do better as I don't use Scala.

"-" ~ EXPR ^^ {case "-" ~ b => Diff(_, b)}
for 5-4-3, it expands to
Diff(5, 4-3)
which is
Diff(5, Diff(4, 3))
however, what you need is:
Diff(Diff(5, 4), 3))
// for 5 + 4 - 3 it should be
Diff(Sum(5, 4), 3)
you need to involve stack.

It seems using "+" ~ EXPR made the answer incorrect. It should have been FACTOR instead.

Related

Difference between dot and space in Scala

What precisely is the difference between . and when used to invoke functions from objects in Scala?
For some reason, I get variations, like:
scala> val l:List[Int] = 1::Nil
l: List[Int] = List(1, 2, 3)
scala> l foldLeft(0)((hd, nxt) => hd + nxt)
<console>:13: error: Int(1) does not take parameters
| foldLeft(1)((hd, nxt) => hd + nxt)
^
scala>l.foldLeft(0)((hd, nxt) => hd + nxt)
res2: Int = 2
(And while I'm at it, what's the name of that operation? I kept trying to find the strict definition of the . operator and I have no idea what it's called.)
Having space instead of dot is called postfix notation if there are no arguments in the called function on the object, or infix notation if there is an argument that the function requires.
Postix example: l sum, equivalent to l.sum
Infix example: l map (_ * 2), equivalent to l.map(_ * 2)
The issue with these notations is that they are inherently more ambiguous in their interpretation. A classic example from math:
1 + 2 * 3 + 4 is ambiguous and depends on the priority of the operators.
1.+(2.*(3).+(4) has only one meaningful interpretation.
Therefore it is not a different operator, but the same as the dot, just susceptible to ambiguity that can lead to syntactical errors like your case or even worse logical errors when you chain infix operators.
You can actually express foldLeft with infix notation in this way:
(l foldLeft 0)((hd, nxt) => hd + nxt)
or even
(0 /: l)((hd, nxt) => hd + nxt)
Where /: is just an alias for foldLeft and makes use of the unique semantics of operator ending in colon(:), which are interpreted as l./:(0) (the reverse of the usual).
Desugar it with "-Xprint:parser" or "-Xprint:typer"
Example 1 Desugared:
scala> (List(1,2) foldLeft 0)((hd, nxt) => hd + nxt)
...
List(1, 2).foldLeft(0)(((hd, nxt) => hd.$plus(nxt)))
...
immutable.this.List.apply[Int](1, 2).foldLeft[Int](0)(((hd: Int, nxt: Int) => hd.+(nxt)));
As you can see, (List(1,2) foldLeft 0) translates into (List(1, 2).foldLeft(0)) in the parser phase. This expression returns a curried function that takes in the second set of parenthesis to produce a result (remember that a curried function is just a function that takes in an argument and returns another function with one fewer argument).
Example 2 Desugared:
scala> List(1,2) foldLeft(0)((hd, nxt) => hd + nxt)
...
List(1, 2)(foldLeft(0)(((hd, nxt) => hd.$plus(nxt))))
...
<console>:8: error: not found: value foldLeft
List(1,2) (foldLeft(0)((hd, nxt) => hd + nxt))
The parenthesis are going around (foldLeft(0)((hd, nxt) => hd + nxt)).
Style:
The way you are supposed to use space delimited methods is 1 object followed by 1 method followed by 1 set of parenthesis, which produces a new object that can be followed by a new method.
obj method paramerer // good
obj method1 paramerer1 method2 paramerer2 // good
obj method1 paramerer1 method2 paramerer2 method3 paramerer3 // compiles, but might need to be broken up
You can follow an object with postfix a method that takes no parameters, but this isn't always the approved style, especially for accessors.
foo.length // good
foo length // compiles, but can be confusing.
Space delimited methods are normally reserved for either pure functions (like map, flatmap, filter) or for domain specific languages (DSL).
In the case of foo.length, there is no () on length, so the whitespace isn't necessary to convey the idea that length is pure.

How is foldLeft operator implemented in Scala?

Why does foldLeft syntax operator works, for example i would expect this code
(10 /: (1 to 5))(_ + _)
To give me an error "value /: is not a member of Int". How does it expands method /: on all types in type system?
Here is the definition of the "shortcut" operator:
def /:[B](z: B)(op: (B, A) => B): B = foldLeft(z)(op)
If operator ends with a colon, it is a right-associative. 1 :: Nil is another example, there is no method :: on Int
this all works:
(1 to 5)./:(10)(_ + _)
((1 to 5) foldLeft 10)(_ + _) (almost the same as your example,
but here it's more obvious that foldLeft is actually a method on the
range object)
(1 to 5).foldLeft(10)(_ + _)
Your question is not entirely clear (there's no n mentioned in your expression), but: Operators that end with a colon are interpreted as methods on the right-hand argument, not the left. Your expression is equivalent to
(1 to 5)./:(10)(_ + _)
in which /: is more clearly seen to be a method of the Range object on the left.

Named parameters vs _, dot notation vs infix operation, curly vs round brackets when using higher-order functions in Scala

I'm having the hardest time understanding when I can or can't omit brackets and/or periods, and how this interplays with _.
The specific case I had with this was
val x: X = ???
val xss: List[List[X]] = ???
xss map x :: _ //this doesn't compile
xss map _.::(x) //this is the same as the above (and thus doesn't compile)
the above two seem to be identical to xss.map(_).::(x)
xss map (x :: _) //this works as expected
xss map {x :: _} //this does the same thing as the above
meanwhile, the following also fail:
xss.map xs => x :: xs //';' expected but '=>' found.
xss.map x :: _ //missing arguments for method map in class List; follow this method with `_' if you want to treat it as a partially applied function
//so when I try following the method with _, I get my favourite:
xss.map _ x :: _ //Cannot construct a collection of type That with elements of type B based on a collection of type List[List[Main.X]]
//as opposed to
xss map _ x :: _ //missing parameter type for expanded function ((x$1) => xss.map(x$1).x(($colon$colon: (() => <empty>))))
Right now, I often play "toggle the symbols until it compiles" which I believe to be a suboptimal programming strategy. How does this all work?
First we need to distinguish between xss.map(f) and xss map f. According to Scala Documentation any method which takes a single parameter can be used as an infix operator.
Actually map method in List is one of these methods. Ignoring the full signature and the fact that it's inherited from TraversableLike, the signature is as follows:
final def map[B](f: (A) ⇒ B): List[B]
So it takes a single parameter, namely f, which is a function with type A => B. So if you have a function value defined as
val mySize = (xs:List[Int]) => xs.size
you can choose between
xss.map(mySize)
or
xss map mySize
This is a matter of preference but according to Scala Style Guide, for this case, the latter is preferred, unless if it is part of a complex expression where it's better to stick with dot notation.
Note that if you opt to use the dot notation you always need to qualify the function application with brackets! That's why none of the following compiles successfully.
xss.map xs => x :: xs // Won't compile
xss.map x :: _ // Won't compile
xss.map _ x :: _ // Won't compile
But most of the time instead of passing a function value you need to pass a function literal (aka anonymous function). In this case again if you use the dot notation you need something like xss.map(_.size). But if you use the infix notation, it will be a matter of precedence.
For example
xss map x :: _ // Won't compile!
does not work because of operator precedence. So you need to use brackets to disambiguiate the situation for compiler by xss map (x :: _).
Use of curly braces instead of brackets has a very clear and simple rule. Again any function which takes only one parameter can be applied with curly braces instead of brackets, both for infix and dot notations. So the following statements will compile.
xss.map{x :: _}
xss map {x :: _}
For avoiding confusions you can begin with dot notation and explicit types for parameters. Later after being compiled - and probably writing some unit tests for your code - you can start refactoring the code by removing unnecessary types, using infix notation, and using curly braces instead of brackets where it makes sense.
For this purpose you can refer to Scala Style Guide and Martin Odersky's talk in Scala Days 2013 which is concerning Scala coding style. Also you can always ask for help from IDEs for refactoring the code to be more concise.

Scala Vector fold syntax (/: and :\ and /:\)

Can someone provide some examples for how
/: :\ and /:\
Actually get used? I assume they're shortcuts to the reduce / fold methods, but there's no examples on how they actually get used in the Scala docs, and they're impossible to google / search for on StackOverflow.
I personally prefer the /: and :\ forms of foldLeft and foldRight. Two reasons:
It has a more natural feel because you can see that you are pushing a value into the left/right of a collection and applying a function. That is
(1 /: ints) { _ + _ }
ints.foldLeft(1) { _ + _ }
Are both equivalent, but I tend to think the former emphasises my intuition as to what is happening. If you want to know how this is happening (i.e. the method appears to be called on the value 1, not the collection), it's because methods ending in a colon are right-associative. This can be seen in ::, +: etc etc elsewhere in the standard library.
The ordering of the Function2 parameters is the same order as the folded element and that which is folded into:
(b /: as) { (bb, a) => f(bb, a) }
// ^ ^ ^ ^
// ^ ^ ^ ^
// B A B A
Better in every way than:
as.foldLeft(b) { (bb, a) => f(bb, a) }
Although I admit that this was a far more important difference in the era before decent IDE support: nowadays IDEA can tell me what function is expected with a simple CTRL-P
I hope it should also be obvious how :\ works with foldRight - it's basically exactly the same, except that the value appears to be being pushed in from the right hand side. I must say, I tend to steer well clear of foldRight in scala because of how it is implemented (i.e. wrongly).
/: is a synonym for foldLeft and :\ for foldRight.
But remember that : makes /: apply to the object to the right of it.
Assuming you know that (_ * _) is an anonymous function that's equivalent to (a, b) => a * b, and the signature of foldLeft and foldRight are
def foldLeft [B] (z: B)(f: (B, A) ⇒ B): B
def foldRight [B] (z: B)(f: (A, B) ⇒ B): B
i.e. they're curried functions taking a start value and a function combining the start value / accumulator with an item from the list, some examples are:
List(1,2,3).foldLeft(1)(_*_)
which is the same as
(1 /: List(1,2,3))(_*_)
And
List(1,2,3).foldRight(1)(_*_)
in infix notation is
(List(1,2,3) foldRight 1)(_*_)
which is the same as
(List(1,2,3) :\ 1)(_*_)
Add your own collections and functions and enjoy!
The thing to remember with the short (/: and :\) notations is that, because you're using the infix notations you need to put parentheses around the first part in order for it to pick up the second argument list properly. Also, remember that the functions for foldLeft and foldRight are the opposite way round, but it makes sense if you're visualising the fold in your head.
Rex Kerr has written nice answer about folds here. Near the end you can see an example of shortcut syntax of foldLeft and foldRight.

When to use parenthesis in Scala infix notation

When programming in Scala, I do more and more functional stuff. However, when using infix notation it is hard to tell when you need parenthesis and when you don't.
For example the following piece of code:
def caesar(k:Int)(c:Char) = c match {
case c if c isLower => ('a'+((c-'a'+k)%26)).toChar
case c if c isUpper => ('A'+((c-'A'+k)%26)).toChar
case _ => c
}
def encrypt(file:String,k:Int) = (fromFile(file) mkString) map caesar(k)_
The (fromFile(file) mkString) needs parenthesis in order to compile. When removed I get the following error:
Caesar.scala:24: error: not found: value map
def encrypt(file:String,k:Int) = fromFile(file) mkString map caesar(k)_
^
one error found
mkString obviously returns a string on which (by implicit conversion AFAIK)I can use the map function.
Why does this particular case needs parentheses? Is there a general guideline on when and why you need it?
This is what I put together for myself after reading the spec:
Any method which takes a single parameter can be used as an infix operator: a.m(b) can be written a m b.
Any method which does not require a parameter can be used as a postfix operator: a.m can be written a m.
For instance a.##(b) can be written a ## b and a.! can be written a!
Postfix operators have lower precedence than infix operators, so foo bar baz means foo.bar(baz) while foo bar baz bam means (foo.bar(baz)).bam and foo bar baz bam bim means (foo.bar(baz)).bam(bim).
Also given a parameterless method m of object a, a.m.m is valid but a m m is not as it would parse as exp1 op exp2.
Because there is a version of mkString that takes a single parameter it will be seen as an infix opertor in fromFile(file) mkString map caesar(k)_. There is also a version of mkString that takes no parameter which can be used a as postfix operator:
scala> List(1,2) mkString
res1: String = 12
scala> List(1,2) mkString "a"
res2: String = 1a2
Sometime by adding dot in the right location, you can get the precedence you need, e.g. fromFile(file).mkString map { }
And all that precedence thing happens before typing and other phases, so even though list mkString map function makes no sense as list.mkString(map).function, this is how it will be parsed.
The Scala reference mentions (6.12.3: Prefix, Infix, and Postfix Operations)
In a sequence of consecutive type infix operations t0 op1 t1 op2 . . .opn tn, all operators op1, . . . , opn must have the same associativity.
If they are all left-associative, the sequence is interpreted as (. . . (t0 op1 t1) op2 . . .) opn tn.
In your case, 'map' isn't a term for the operator 'mkstring', so you need grouping (with the parenthesis around 'fromFile(file) mkString')
Actually, Matt R comments:
It's not really an associativity issue, more that "Postfix operators always have lower precedence than infix operators. E.g. e1 op1 e2 op2 is always equivalent to (e1 op1 e2) op2". (Also from 6.12.3)
huynhjl's answer (upvoted) gives more details, and Mark Bush's answer (also upvoted) point to "A Tour of Scala: Operators" to illustrate that "Any method which takes a single parameter can be used as an infix operator".
Here's a simple rule: never used postfix operators. If you do, put the whole expression ending with the postfix operator inside parenthesis.
In fact, starting with Scala 2.10.0, doing that will generate a warning by default.
For good measure, you might want to move the postfix operator out, and use dot notation for it. For example:
(fromFile(file)).mkString map caesar(k)_
Or, even more simply,
fromFile(file).mkString map caesar(k)_
On the other hand, pay attention to the methods where you can supply an empty parenthesis to turn them into infix:
fromFile(file) mkString () map caesar(k)_
The spec doesn't make it clear, but my experience and experimentation has shown that the Scala compiler will always try to treat method calls using infix notation as infix operators. Even though your use of mkString is postfix, the compiler tries to interpret it as infix and so is trying to interpret "map" as its argument. All uses of postfix operators must either be immediately followed by an expression terminator or be used with "dot" notation for the compiler to see it as such.
You can get a hint of this (although it's not spelled out) in A Tour of Scala: Operators.