Reuse parser within another parser with Scala parser combinators - scala

I have a parser for arithmetic expressions:
object FormulaParser extends JavaTokenParsers {
def apply(input: String) = parseAll(formula, input)
// ...
val formula: Parser[Formula] =
comparison | comparable | concatenable | term | factor
}
I need to parse a different language that can contain formulas. Let's say I need to parse something like X < formula. Unfortunately I cannot reuse FormulaParser.formula in my new parser:
object ConditionParser extends JavaTokenParsers {
def apply(input: String) = parseAll(condition, input)
// ...
val condition: Parser[Condition] =
"X" ~ ("<=" | "<") ~ FormulaParser.formula ^^ { ... } // doesn't work
}
because the parser on the left-hand side of ~ is an instance of ConditionParser.Parser, so its ~ method expects something with that same type, not something of type FormulaParser.Parser.
The whole point of using parser combinators is, well, combining parsers! It seems silly to me that my first attempt didn't work, although I understand why it happens (we are reusing base parsers by extending a base trait).
Is there a simple way to combine parsers defined in different types?

In order to reuse parsers, you need to use inheritance. So if you make FormulaParsers a class or a trait, ConditionParser can inherit from it and reuse its parsers.
This is also how you're already reusing the parsers defined in JavaTokenParsers.

Related

Understanding Scala Implicits

While reading Functional Programming in Scala by Chiusano and Bjarnason, I encountered the following code in chapter 9, Parser Combinators:
trait Parsers[ParseError, Parser[+_]] { self =>
...
def or[A](s1: Parser[A], s2: Parser[A]): Parser[A]
implicit def string(s: String): Parser[String]
implicit def operators[A](p: Parser[A]) = ParserOps[A](p)
implicit def asStringParser[A](a: A)(implicit f: A => Parser[String]):
ParserOps[String] = ParserOps(f(a))
case class ParserOps[A](p: Parser[A]) {
def |[B>:A](p2: Parser[B]): Parser[B] = self.or(p,p2)
def or[B>:A](p2: => Parser[B]): Parser[B] = self.or(p,p2)
}
}
I understand that if there is a type incompatibility or missing parameters during compilation, the Scala compiler would look for a missing function that converts the non-matching type to the desired type or a variable in scope with the desired type that fits the missing parameter respectively.
If a string occurs in a place that requires a Parser[String], the string function in the above trait should be invoked to convert the string to a Parser[String].
However, I've difficulties understanding the operators and asStringParser functions. These are the questions that I have:
For the implicit operators function, why isn't there a return type?
Why is ParserOps defined as a case class and why can't the | or or function be defined in the Parsers trait itself?
What exactly is the asStringParser trying to accomplish? What is its purpose here?
Why is self needed? The book says, "Use self to explicitly disambiguate reference to the or method on the trait," but what does it mean?
I'm truly enjoying the book but the use of advanced language-specific constructs in this chapter is hindering my progress. It would be of great help if you can explain to me how this code works. I understand that the goal is to make the library "nicer" to use through operators like | and or, but don't understand how this is done.
Every method has a return type. In this case, it's ParserOps[A]. You don't have to write it out explicitly, because in this case it can be inferred automatically.
Probably because of the automatically provided ParserOps.apply-factory method in the companion object. You need fewer vals in the constructor, and you don't need the new keyword to instantiate ParserOps. It is not used in pattern matching though, so, you could do the same thing with an ordinary (non-case) class, wouldn't matter.
It's the "pimp-my-library"-pattern. It attaches methods | and or to Parser, without forcing Parser to inherit from anything. In this way, you can later declare Parser to be something like ParserState => Result[A], but you will still have methods | and or available (even though Function1[ParserState, Result[A]] does not have them).
You could put | and or directly in Parsers, but then you would have to use the syntax
|(a, b)
or(a, b)
instead of the much nicer
a | b
a or b
There are no "real operators" in Scala, everything is a method. If you want to implement a method that behaves as if it were an infix operator, you do exactly what is done in the book.

Map an instance using function in Scala

Say I have a local method/function
def withExclamation(string: String) = string + "!"
Is there a way in Scala to transform an instance by supplying this method? Say I want to append an exclamation mark to a string. Something like:
val greeting = "Hello"
val loudGreeting = greeting.applyFunction(withExclamation) //result: "Hello!"
I would like to be able to invoke (local) functions when writing a chain transformation on an instance.
EDIT: Multiple answers show how to program this possibility, so it seems that this feature is not present on an arbitraty class. To me this feature seems incredibly powerful. Consider where in Java I want to execute a number of operations on a String:
appendExclamationMark(" Hello! ".trim().toUpperCase()); //"HELLO!"
The order of operations is not the same as how they read. The last operation, appendExclamationMark is the first word that appears. Currently in Java I would sometimes do:
Function.<String>identity()
.andThen(String::trim)
.andThen(String::toUpperCase)
.andThen(this::appendExclamationMark)
.apply(" Hello "); //"HELLO!"
Which reads better in terms of expressing a chain of operations on an instance, but also contains a lot of noise, and it is not intuitive to have the String instance at the last line. I would want to write:
" Hello "
.applyFunction(String::trim)
.applyFunction(String::toUpperCase)
.applyFunction(this::withExclamation); //"HELLO!"
Obviously the name of the applyFunction function can be anything (shorter please). I thought backwards compatibility was the sole reason Java's Object does not have this.
Is there any technical reason why this was not added on, say, the Any or AnyRef classes?
You can do this with an implicit class which provides a way to extend an existing type with your own methods:
object StringOps {
implicit class RichString(val s: String) extends AnyVal {
def withExclamation: String = s"$s!"
}
def main(args: Array[String]): Unit = {
val m = "hello"
println(m.withExclamation)
}
}
Yields:
hello!
If you want to apply any functions (anonymous, converted from methods, etc.) in this way, you can use a variation on Yuval Itzchakov's answer:
object Combinators {
implicit class Combinators[A](val x: A) {
def applyFunction[B](f: A => B) = f(x)
}
}
A while after asking this question, I noticed that Kotlin has this built in:
inline fun <T, R> T.let(block: (T) -> R): R
Calls the specified function block with this value as its argument and returns
its result.
A lot more, quite useful variations of the above function are provided on all types, like with, also, apply, etc.

Value to indicate to use default

In Scala I would like to have something like this
TokenizerExample.scala
class TokenizerExample private (whateva : Any)(implicit val separator : Char = '.') {
def this(data2Tokenize : String)(implicit s : Char) {
this("", s) //call to base constructor
}
def this(data2Tokenize : Array[Char])(implicit s : Char) { {
this("", s) //call to base constructor
}
}
what I would like to achieve is to allow the user to call any of the two public constructors either providing or not the separator, but if they do NOT provide the separator automatically take the one in the base constructor, I was wondering if there is a value that I can pass to the base constructor so that scala use the default on the private base constructor.
what I would like to avoid it to do the next in each constructor
def this(_3rdConstructor : SytringBuilder)(implicit s : Char = '.') ...
I tried this in many different ways, with the values being implicit, with the separator as a Option, but I do not get a result that I actually like, specially because scala complains about having implicit values in multiple constructors (which kind of defeats the purpose of having them). Is there a way to achieve that behavior in a nice way without
1) forcing the user to provide a separator.
2) go into "bad-practices" by passing null values and then validating them (specially because that would not allow my separator to be a val in the constructor.
3) creating YET ANOTHER LANGUAGE just because I dislike a small little thing about one of them :) .
I would strongly advice you against using implicits for this purpose. The resolution rules are rather complex, and it makes the code extremely hard to follow, because it is almost impossible to tell what value the constructor will end up receiving without the debugger.
If all you are trying to do is avoid defining the default in multiple places, just define it in a companion object:
object Foo {
val defaultParam = ','
}
class Foo {
import Foo.defaultParam
def this(data: String, param: Char = defaultParam) = ???
def this(data: List[Char], param: Char = defaultParam) = ???
// etc ...
}
If you insist on using implicits, you can use a similar approach to the above: just make defaultParam definition implicit, drop the defaults, replacing them with implicit lists, and then import Foo._ in scope where you are making the call. But, really, don't do that: it adds no value, and only has disadvantages in this case.

Is there a good way in Scala to interpret the types of values in a CSV

Suppose I'm given a CSV with the following values:
0, 1.00, Hello
3, 2.13, World
.
.
.
Is there a good method or library that could automatically detect the best type to classify a given column as? In this case (Int, Float, String).
For more context, I'm attempting to extend a CSV parsing library to allow it to report histogram like data on the CSV that is passed in. The idea is to make it very easy to add certain validation tasks into this framework so as to figure out deficiencies or irregularities in a CSV data dump.
Intially I thought to write something which a user could supply a config file that specified the types, but for cases when the CSV column sets are very large, or just for ease of use, I'd like to attempt to automatically detect the types instead of having a user have to write them out.
One answer might be:
def parse(s:String): Any = Try(s.toInt) orElse(Try(s.toDouble)) getOrElse(s)
Then you can use pattern-matching to do whatever you want with it.
You could, of course, first do regular-expression tests on the string to see which type you have. But I'm fairly sure just brute-forcing the parse for each format, as above, will be faster.
Consider parser combinators; inferred types are reported via a list of case classes,
import scala.util.parsing.combinator._
trait CSVType
case class LiteralStr extends CSVType
case class Float extends CSVType
case class Integer extends CSVType
case class Bool extends CSVType
case class NA extends CSVType // Not Available
class CSV extends JavaTokenParsers {
def row: Parser[List[CSVType]] = repsep(value, ",")
def value: Parser[CSVType] =
floatingPointNumber ^^ { f => if (f.toDouble.toInt == f.toDouble) Integer()
else Float() } |
"NA" ^^ { na => NA() } |
("true" | "false") ^^ { b => Bool() } |
stringLiteral ^^ { s => LiteralStr() }
}
object ParseExpr extends CSV with App {
println("in: "+ args(0))
println(parseAll(row, args(0)))
}
Hence
scala> val s = """1.23,2,true,NA,"hello" """
s: String = "1.23,2,true,NA,"hello" "
scala> ParseExpr.main(Array(s))
in: 1.23,2,true,NA,"hello"
[1.24] parsed: List(Float(), Integer(), Bool(), NA(), LiteralStr())
Note that combinators include the parsing of types such as numerics, boolean and strings. In addition, custom types are defined by the parser, for instance NA. See JavaTokenParsers trait for definitions used here.
Each case class may include additional logic to report typing in a most convenient way.

Chaining logging with a simple expression in Scala

I usually use Scala with SLF4J through the Loggable wrapper in LiftWeb. This works decently well with the exception of the quite common method made up only from 1 chain of expressions.
So if you want to add logging to such a method, the simply beautiful, no curly brackets
def method1():Z = a.doX(x).doY(y).doZ()
must become:
def method1():Z = {
val v = a.doX(x).doY(y).doZ()
logger.info("the value is %s".format(v))
v
}
Not quite the same, is it? I gave it a try to solve it with this:
class ChainableLoggable[T](val v:T){
def logInfo(logger:Logger, msg:String, other:Any*):T = {
logger.info(msg.format(v, other))
v
}
}
implicit def anyToChainableLogger[T](v:T):ChainableLoggable[T] = new ChainableLoggable(v)
Now I can use a simpler form
def method1():Z = a.doX(x).doY(y).doZ() logInfo(logger, "the value is %s")
However 1 extra object instantiation and an implicit from Any starts to look like a code stink.
Does anyone know of any better solution? Or maybe I shouldn't even bother with this?
Scala 2.10 has just a solution for you - that's a new feature Value Class which allows you to gain the same effect as the implicit wrappers provide but with no overhead coming from instantiation of those wrapper classes. To apply it you'll have to rewrite your code like so:
implicit class ChainableLoggable[T](val v : T) extends AnyVal {
def logInfo(logger:Logger, msg:String, other:Any*) : T = {
logger.info(msg.format(v, other))
v
}
}
Under the hood the compiler will transform the logInfo into an analogue of Java's common "util" static method by prepending your v : T to it's argument list and updating its usages accordingly - see, nothing gets instantiated.
That looks like the right way to do it, especially if you don't have the tap implicit around (not in the standard library, but something like this is fairly widely used--and tap is standard in Ruby):
class TapAnything[A](a: A) {
def tap(f: A => Any): A = { f(a); a }
}
implicit def anything_can_be_tapped[A](a: A) = new TapAnything(a)
With this, it's less essential to have the info implicit on its own, but if you use it it's an improvement over
.tap(v => logger.info("the value is %s".format(v)))
If you want to avoid using implicits, you can define functions like this one in your own logging trait. Maybe not as pretty as the solution with implicits though.
def info[A](a:A)(message:A=>String) = {
logger.info(message(a))
a
}
info(a.doX(x).doY(y).doZ())("the value is " + _)