Factoring out common cases in pattern matching with partial function - scala

I routinely use partial functions to factor out common clauses in exception handling. For example:
val commonHandler: PartialFunction[Throwable, String] = {
case ex: InvalidClassException => "ice"
}
val tried = try {
throw new Exception("boo")
}
catch commonHandler orElse {
case _:Throwable => "water"
}
println(tried) // water
Naturally, I expected that the match keyword also expects a partial function and that I should be able to do something like this:
val commonMatcher: PartialFunction[Option[_], Boolean] = {
case Some(_) => true
case None => false
}
Some(8) match commonMatcher // Compilation error
What am I doing wrong?

The match is a keyword, not a method, and its syntax does not accept a partial function on its right side (see below). There however exists a pipe method (since 2.13), which, same as map or foreach, accepts a partial function. You can therefore write:
import scala.util.chaining._
Some(8).pipe(commonMatcher)
There was some discussion regarding this (see Pre SIP: Demote match keyword to a method) and there was a PR which made possible to use the match a bit more like a method (Change match syntax #7610), with a dot, but still the syntax is the match keyword needs to be followed by case clauses, see https://docs.scala-lang.org/scala3/reference/syntax.html:
InfixExpr ::= ... other variants ommitted here ...
| InfixExpr MatchClause
SimpleExpr ::= ... other variants ommitted here ...
| SimpleExpr ‘.’ MatchClause
MatchClause ::= ‘match’ <<< CaseClauses >>>
CaseClauses ::= CaseClause { CaseClause }
CaseClause ::= ‘case’ Pattern [Guard] ‘=>’ Block
Compare this with catch syntax:
Expr1 ::= ... other variants ommitted here ...
| ‘try’ Expr Catches [‘finally’ Expr]
Catches ::= ‘catch’ (Expr | ExprCaseClause)

Related

Scala Parser Combinator: recursion

I am writing a parser for boolean expressions, and try to parse input like "true and false"
def boolExpression: Parser[BoolExpression] = boolLiteral | andExpression
def andExpression: Parser[AndExpression] = (boolExpression ~ "and" ~ boolExpression) ^^ {
case b1 ~ "and" ~ b2 => AndExpression(b1, b2)
}
def boolLiteral: Parser[BoolLiteral] = ("true" | "false") ^^ {
s => BoolLiteral(java.lang.Boolean.valueOf(s))
}
The above code does not parse "true and false", since it reads only "true" and applies rule boolLiteral immediately
But if I change the rule boolExpression to this:
def boolExpression: Parser[BoolExpression] = andExpression | boolLiteral
Then, when parsing "true and false", the code throws a StackoverflowError due to endless recursion
java.lang.StackOverflowError
at Parser.boolExpression(NewParser.scala:58)
at Parser.andExpression(NewParser.scala:62)
at Parser.boolExpression(NewParser.scala:58)
at Parser.andExpression(NewParser.scala:62)
...
How to solve this?
This appears to be parsing a string of boolean constants separated by "and" which is best done using the chainl1 primitive in the parser. This process a chain of operations a op b op c op d using left-to-right precedence.
It might look something like this totally untested code:
trait BoolExpression
case class BoolLiteral(value: Boolean) extends BoolExpression
case class AndExpression(l: BoolExpression, r: BoolExpression) extends BoolExpression
def boolLiteral: Parser[BoolLiteral] =
("true" | "false") ^^ { s => BoolLiteral(java.lang.Boolean.valueOf(s)) }
def andExpression: Parser[(BoolExpression, BoolExpression) => BoolExpression] =
"and" ^^ { _ => (l: BoolExpression, r: BoolExpression) => AndExpression(l, r) }
def boolExpression: Parser[BoolExpression] =
chainl1(boolLiteral, andExpression) ^^ { expr => expr }
Presumably the requirement is more complex than this, but the chainl1 parser is a good starting point.
The core of the issue is that the usual parser combinator libraries use parsing algorithms that don't support left recursion. One solution is to rewrite the grammar without left recursion, meaning that every parser needs to consume at least some input before invoking itself recursively. For instance, your grammar can be written like so:
def boolExpression: Parser[BoolExpression] = andExpression | boolLiteral
def andExpression: Parser[AndExpression] = (boolLiteral ~ "and" ~ boolExpression) ^^ {
case b1 ~ "and" ~ b2 => AndExpression(b1, b2)
}
But you can also use a parser combinator library that's based on another parsing algorithm that supports left recursion. I'm only aware of one for Scala:
https://github.com/djspiewak/gll-combinators
I don't know to what degree it is production ready.
//edit:
I think this one might also support left recursion. Again, I don't know the degree to which it is production ready.
https://github.com/djspiewak/parseback

Can Scala try-catch-finally expression be without braces?

I'm learning Scala and confused about the try-catch-finally syntax.
In Scala Syntax Specification, it says:
Expr1 ::= ‘try’ Expr [‘catch’ Expr] [‘finally’ Expr]
| ...
Can I write expression without { } blocks like this:
try
println("Hello")
catch
RuntimeException e => println("" + e)
finally
println("World")
Or the expression must be a block expression ?
Scala 3 (Dotty) is experimenting with optional braces (significant indentation) so the following works
scala> try
| 1 / 0
| catch
| case e => println(s"good catch $e")
| finally
| println("Celebration dance :)")
|
good catch java.lang.ArithmeticException: / by zero
Celebration dance :)
val res1: AnyVal = ()
where we note the handler
case e => println(s"good catch $e")
did not need braces as in Scala 2. In fact due to special treatment of case clauses after catch keyword the following would also work
scala> try
| 1 / 0
| catch
| case e => println(s"good catch $e")
| finally
| println("Celebration dance :)")
|
good catch java.lang.ArithmeticException: / by zero
Celebration dance :)
val res2: AnyVal = ()
where we note the handler did not have to be indented after catch
catch
case e => println(s"good catch $e")
Sure.
import scala.util.control.NonFatal
def fourtyseven: PartialFunction[Throwable, Int] = {
case NonFatal(_) => 47
}
def attempt(n: => Int): Int =
try n catch fourtyseven finally println("done")
println(attempt(42))
println(attempt(sys.error("welp")))
This compiles and runs as expected, although I had to define the catch expression separately (as it requires braces).
You can play around with this code here on Scastie.
A few of notes:
try/catch/finally is an expression that returns a value (this could be a bit unfamiliar if you're coming from Java) -- you can read more about it here on the docs
catch always takes a PartialFunction from a Throwable to some type, with the overall return type of the try/catch to be the closest common superclass of the two (e.g. say you have class Animal; class Dog extends Animal; class Cat extends Animal, if the try returns a Dog and the catch returns a Cat the overall expression will return an Animal)
I used the NonFatal extractor in the catch partial function, you can read more about it in this answer, while extractor objects in general are described here in the official docs

Omitting parenthesis

Here is Scala code
#1
def method1 = {
map1.foreach({
case(key, value) => { println("key " + key + " value " + value) }
})
}
#2
def method1 = {
map1.foreach{
case(key, value) => { println("key " + key + " value " + value) }
}
}
It almost figures for me, but nevertheless I want to make it clearer: why is it possible to omit parenthesis in this case?
You can always exchange methods argument parentheses for curly braces in Scala. For example
def test(i: Int) {}
test { 3 }
The base for this is the definition of argument expressions, covered by section §6.6 of the Scala Language Specification (SLS):
ArgumentExprs ::= ‘(’ [Exprs] ‘)’
| ‘(’ [Exprs ‘,’] PostfixExpr ‘:’ ‘_’ ‘*’ ’)’
| [nl] BlockExpr
The curly braces are covered by the last case (block expression), which essentially is ‘{’ Block ‘}’ (cf. beginning of chapter 6 SLS).
This doesn't go for conditional expressions, if, (§6.16 SLS) and while loop expressions (§6.17 SLS), but it works for for comprehensions (§6.19 SLS), somewhat of an inconsistency.
A pattern matching statement or pattern matching anonymous functions literal on the other hand must be defined with curly braces, e.g. { case i: Int => i + i }, parentheses are not allowed here (§8.5 SLS).
In your method call, foreach takes a function argument, so you can drop the redundant parentheses or double braces:
List(1, 2).foreach({ case i => println(i) })
List(1, 2).foreach {{ case i => println(i) }}
List(1, 2).foreach { case i => println(i) } // no reason to have double braces
In this case, the pattern matching doesn't really buy you anything, and you can use a regular (non-pattern-matching) function, and thus the following would work, too:
List(1, 2).foreach(i => println(i)) // §6.23 SLS - anonymous functions
List(1, 2).foreach(println) // §6.26.2 / §6.26.5 - eta expansion
In your case though, a Map's map method passes a tuple of key and value to the function, that's why you use pattern matching (case statements) to deconstruct that tuple, so you are bound to have curly braces. That is nicer than writing
map1.foreach(tup => println("key " + tup._1 + " value " + tup._2)
As a side note, putting braces around pattern matching case bodies is considered bad style; they are not necessary, even if the body spans multiple lines. So instead of
case(key, value) => { println("key " + key + " value " + value) }
you should write
case (key, value) => println("key " + key + " value " + value)
There is a bit of polemic in this blog post regarding the different variants of using braces, dots and parentheses in Scala (section "What's not to like"). In the end, you are to decide which is the best style—this is where people advocating "opinionated" versus "un-opinionated" languages fight with each other.
In general, you need curly braces when the expression spans multiple lines or if you have a pattern match. When calling methods with multiple parameter lists, often the last list contains one function argument, so you get nice looking—subjective judgment of course—syntax:
val l = List(1, 2, 3)
l.foldLeft(0) {
(sum, i) => sum + i
}

Explain this pattern matching code

This code is from Querying a Dataset with Scala's Pattern Matching:
object & { def unapply[A](a: A) = Some((a, a)) }
"Julie" match {
case Brothers(_) & Sisters(_) => "Julie has both brother(s) and sister(s)"
case Siblings(_) => "Julie's siblings are all the same sex"
case _ => "Julie has no siblings"
}
// => "Julie has both brother(s) and sister(s)"
How does & actually work? I don't see a Boolean test anywhere for the conjunction. How does this Scala magic work?
Here's how unapply works in general:
When you do
obj match {case Pattern(foo, bar) => ... }
Pattern.unapply(obj) is called. This can either return None in which case the pattern match is a failure, or Some(x,y) in which case foo and bar are bound to x and y.
If instead of Pattern(foo, bar) you did Pattern(OtherPattern, YetAnotherPatter) then x would be matched against the pattern OtherPattern and y would be matched against YetAnotherPattern. If all of those pattern matches are successful, the body of the match executes, otherwise the next pattern is tried.
when the name of a pattern is not alphanumeric, but a symbol (like &), it is used infix, i.e. you write foo & bar instead of &(foo, bar).
So here & is a pattern that always returns Some(a,a) no matter what a is. So & always matches and binds the matched object to its two operands. In code that means that
obj match {case x & y => ...}
will always match and both x and y will have the same value as obj.
In the example above this is used to apply two different patterns to the same object.
I.e. when you do
obj match { case SomePattern & SomeOtherPattern => ...}`
first the pattern & is applied. As I said, it always matches and binds obj to its LHS and its RHS. So then SomePattern is applied to &'s LHS (which is the same as obj) and SomeOtherPattern is applied to &'s RHS (which is also the same as obj).
So in effect, you just applied two patterns to the same object.
Let's do this from the code. First, a small rewrite:
object & { def unapply[A](a: A) = Some(a, a) }
"Julie" match {
// case Brothers(_) & Sisters(_) => "Julie has both brother(s) and sister(s)"
case &(Brothers(_), Sisters(_)) => "Julie has both brother(s) and sister(s)"
case Siblings(_) => "Julie's siblings are all the same sex"
case _ => "Julie has no siblings"
}
The new rewrite means exactly the same thing. The comment line is using infix notation for extractors, and the second is using normal notation. They both translate to the same thing.
So, Scala will feed "Julie" to the extractor, repeatedly, until all unbound variables got assigned to Some thing. The first extractor is &, so we get this:
&.unapply("Julie") == Some(("Julie", "Julie"))
We got Some back, so we can proceed with the match. Now we have a tuple of two elements, and we have two extractors inside & as well, so we feed each element of the tuple to each extractor:
Brothers.unapply("Julie") == ?
Sisters.unapply("Julie") == ?
If both of these return Some thing, then the match is succesful. Just for fun, let's rewrite this code without pattern matching:
val pattern = "Julie"
val extractor1 = &.unapply(pattern)
if (extractor1.nonEmpty && extractor1.get.isInstanceOf[Tuple2]) {
val extractor11 = Brothers.unapply(extractor1.get._1)
val extractor12 = Sisters.unapply(extractor1.get._2)
if (extractor11.nonEmpty && extractor12.nonEmpty) {
"Julie has both brother(s) and sister(s)"
} else {
"Test Siblings and default case, but I'll skip it here to avoid repetition"
}
} else {
val extractor2 = Siblings.unapply(pattern)
if (extractor2.nonEmpty) {
"Julie's siblings are all the same sex"
} else {
"Julie has no siblings"
}
Ugly looking code, even without optimizing to only get extractor12 if extractor11 isn't empty, and without the code repetition that should have gone where there's a comment. So I'll write it in yet another style:
val pattern = "Julie"
& unapply pattern filter (_.isInstanceOf[Tuple2]) flatMap { pattern1 =>
Brothers unapply pattern1._1 flatMap { _ =>
Sisters unapply pattern1._2 flatMap { _ =>
"Julie has both brother(s) and sister(s)"
}
}
} getOrElse {
Siblings unapply pattern map { _ =>
"Julie's siblings are all the same sex"
} getOrElse {
"Julie has no siblings"
}
}
The pattern of flatMap/map at the beginning suggests yet another way of writing this:
val pattern = "Julie"
(
for {
pattern1 <- & unapply pattern
if pattern1.isInstanceOf[Tuple2]
_ <- Brothers unapply pattern1._1
_ <- Sisters unapply pattern1._2
} yield "Julie has both brother(s) and sister(s)
) getOrElse (
for {
_ <- Siblings unapply pattern
} yield "Julie's siblings are all the same sex"
) getOrElse (
"julie has no siblings"
)
You should be able to run all this code and see the results for yourself.
For additional info, I recommend reading the Infix Operation Patterns section (8.1.10) of the Scala Language Specification.
An infix operation pattern p op q is a
shorthand for the constructor or
extractor pattern op(p,q). The
precedence and associativity of
operators in patterns is the same as
in expressions.
Which is pretty much all there is to it, but then you can read about constructor and extractor patterns and patterns in general. It helps separate the syntactic sugar aspect (the "magic" part of it) from the fairly simple idea of pattern matching:
A pattern is built from constants,
constructors, variables and type
tests. Pattern matching tests whether
a given value (or sequence of values)
has the shape defined by a pattern,
and, if it does, binds the variables
in the pattern to the corresponding
components of the value (or sequence
of values).

Grammars, Scala Parsing Combinators and Orderless Sets

I'm writing an application that will take in various "command" strings. I've been looking at the Scala combinator library to tokenize the commands. I find in a lot of cases I want to say: "These tokens are an orderless set, and so they can appear in any order, and some might not appear".
With my current knowledge of grammars I would have to define all combinations of sequences as such (pseudo grammar):
command = action~content
action = alphanum
content = (tokenA~tokenB~tokenC | tokenB~tokenC~tokenA | tokenC~tokenB~tokenA ....... )
So my question is, considering tokenA-C are unique, is there a shorter way to define a set of any order using a grammar?
You can use the "Parser.^?" operator to check a group of parse elements for duplicates.
def tokens = tokenA | tokenB | tokenC
def uniqueTokens = (tokens*) ^? (
{ case t if (t == t.removeDuplicates) => t },
{ "duplicate tokens found: " + _ })
Here is an example that allows you to enter any of the four stooges in any order, but fails to parse if a duplicate is encountered:
package blevins.example
import scala.util.parsing.combinator._
case class Stooge(name: String)
object StoogesParser extends RegexParsers {
def moe = "Moe".r
def larry = "Larry".r
def curly = "Curly".r
def shemp = "Shemp".r
def stooge = ( moe | larry | curly | shemp ) ^^ { case s => Stooge(s) }
def certifiedStooge = stooge | """\w+""".r ^? (
{ case s: Stooge => s },
{ "not a stooge: " + _ })
def stooges = (certifiedStooge*) ^? (
{ case x if (x == x.removeDuplicates) => x.toSet },
{ "duplicate stooge in: " + _ })
def parse(s: String): String = {
parseAll(stooges, new scala.util.parsing.input.CharSequenceReader(s)) match {
case Success(r,_) => r.mkString(" ")
case Failure(r,_) => "failure: " + r
case Error(r,_) => "error: " + r
}
}
}
And some example usage:
package blevins.example
object App extends Application {
def printParse(s: String): Unit = println(StoogesParser.parse(s))
printParse("Moe Shemp Larry")
printParse("Moe Shemp Shemp")
printParse("Curly Beyonce")
/* Output:
Stooge(Moe) Stooge(Shemp) Stooge(Larry)
failure: duplicate stooge in: List(Stooge(Moe), Stooge(Shemp), Stooge(Shemp))
failure: not a stooge: Beyonce
*/
}
There are ways around it. Take a look at the parser here, for example. It accepts 4 pre-defined numbers, which may appear in any other, but must appear once, and only once.
OTOH, you could write a combinator, if this pattern happens often:
def comb3[A](a: Parser[A], b: Parser[A], c: Parser[A]) =
a ~ b ~ c | a ~ c ~ b | b ~ a ~ c | b ~ c ~ a | c ~ a ~ b | c ~ b ~ a
I would not try to enforce this requirement syntactically. I'd write a production that admits multiple tokens from the set allowed and then use a non-parsing approach to ascertaining the acceptability of the keywords actually given. In addition to allowing a simpler grammar, it will allow you to more easily continue parsing after emitting a diagnostic about the erroneous usage.
Randall Schulz
I don't know what kind of constructs you want to support, but I gather you should be specifying a more specific grammar. From your comment to another answer:
todo message:link Todo class to database
I guess you don't want to accept something like
todo message:database Todo to link class
So you probably want to define some message-level keywords like "link" and "to"...
def token = alphanum~':'~ "link" ~ alphanum ~ "class" ~ "to" ~ alphanum
^^ { (a:String,b:String,c:String) => /* a == "message", b="Todo", c="database" */ }
I guess you would have to define your grammar at that level.
You could of course write a combination rule that does this for you if you encounter this situation frequently.
On the other hand, maybe the option exists to make "tokenA..C" just "token" and then differentiate inside the handler of "token"