Scala Parser Token Delimiter Problem - scala

I'm trying to define a grammar for the commands below.
object ParserWorkshop {
def main(args: Array[String]) = {
ChoiceParser("todo link todo to database")
ChoiceParser("todo link todo to database deadline: next tuesday context: app.model")
}
}
The second command should be tokenized as:
action = todo
message = link todo to database
properties = [deadline: next tuesday, context: app.model]
When I run this input on the grammar defined below, I receive the following error message:
[1.27] parsed: Command(todo,link todo to database,List())
[1.36] failure: string matching regex `\z' expected but `:' found
todo link todo to database deadline: next tuesday context: app.model
^
As far as I can see it fails because the pattern for matching the words of the message is nearly identical to the pattern for the key of the property key:value pair, so the parser cannot tell where the message ends and the property starts. I can solve this by insisting that start token be used for each property like so:
todo link todo to database :deadline: next tuesday :context: app.model
But i would prefer to keep the command as close natural language as possible.
I have two questions:
What does the error message actually mean?
And how would I modify the existing grammar to work for the given input strings?
import scala.util.parsing.combinator._
case class Command(action: String, message: String, properties: List[Property])
case class Property(name: String, value: String)
object ChoiceParser extends JavaTokenParsers {
def apply(input: String) = println(parseAll(command, input))
def command = action~message~properties ^^ {case a~m~p => new Command(a, m, p)}
def action = ident
def message = """[\w\d\s\.]+""".r
def properties = rep(property)
def property = propertyName~":"~propertyValue ^^ {
case n~":"~v => new Property(n, v)
}
def propertyName: Parser[String] = ident
def propertyValue: Parser[String] = """[\w\d\s\.]+""".r
}

It is really simple. When you use ~, you have to understand that there's no backtracking on individual parsers which have completed succesfully.
So, for instance, message got everything up to before the colon, as all of that is an acceptable pattern. Next, properties is a rep of property, which requires propertyName, but it only finds the colon (the first char not gobbled by message). So propertyName fails, and property fails. Now, properties, as mentioned, is a rep, so it finishes succesfully with 0 repetitions, which then makes command finish succesfully.
So, back to parseAll. The command parser returned succesfully, having consumed everything before the colon. It then asks the question: are we at the end of the input (\z)? No, because there is a colon right next. So, it expected end-of-input, but got a colon.
You'll have to change the regex so it won't consume the last identifier before a colon. For example:
def message = """[\w\d\s\.]+(?![:\w])""".r
By the way, when you use def you force the expression to be reevaluated. In other words, each of these defs create a parser every time each one is called. The regular expressions are instantiated every time the parsers they belong to are processed. If you change everything to val, you'll get much better performance.
Remember, these things define the parser, they do not run it. It is parseAll which runs a parser.

Related

Map an instance using function in Scala

Say I have a local method/function
def withExclamation(string: String) = string + "!"
Is there a way in Scala to transform an instance by supplying this method? Say I want to append an exclamation mark to a string. Something like:
val greeting = "Hello"
val loudGreeting = greeting.applyFunction(withExclamation) //result: "Hello!"
I would like to be able to invoke (local) functions when writing a chain transformation on an instance.
EDIT: Multiple answers show how to program this possibility, so it seems that this feature is not present on an arbitraty class. To me this feature seems incredibly powerful. Consider where in Java I want to execute a number of operations on a String:
appendExclamationMark(" Hello! ".trim().toUpperCase()); //"HELLO!"
The order of operations is not the same as how they read. The last operation, appendExclamationMark is the first word that appears. Currently in Java I would sometimes do:
Function.<String>identity()
.andThen(String::trim)
.andThen(String::toUpperCase)
.andThen(this::appendExclamationMark)
.apply(" Hello "); //"HELLO!"
Which reads better in terms of expressing a chain of operations on an instance, but also contains a lot of noise, and it is not intuitive to have the String instance at the last line. I would want to write:
" Hello "
.applyFunction(String::trim)
.applyFunction(String::toUpperCase)
.applyFunction(this::withExclamation); //"HELLO!"
Obviously the name of the applyFunction function can be anything (shorter please). I thought backwards compatibility was the sole reason Java's Object does not have this.
Is there any technical reason why this was not added on, say, the Any or AnyRef classes?
You can do this with an implicit class which provides a way to extend an existing type with your own methods:
object StringOps {
implicit class RichString(val s: String) extends AnyVal {
def withExclamation: String = s"$s!"
}
def main(args: Array[String]): Unit = {
val m = "hello"
println(m.withExclamation)
}
}
Yields:
hello!
If you want to apply any functions (anonymous, converted from methods, etc.) in this way, you can use a variation on Yuval Itzchakov's answer:
object Combinators {
implicit class Combinators[A](val x: A) {
def applyFunction[B](f: A => B) = f(x)
}
}
A while after asking this question, I noticed that Kotlin has this built in:
inline fun <T, R> T.let(block: (T) -> R): R
Calls the specified function block with this value as its argument and returns
its result.
A lot more, quite useful variations of the above function are provided on all types, like with, also, apply, etc.

Using regex parser within a JavaTokensParser subclass

I am trying out scala parser combinators with the following object:
object LogParser extends JavaTokenParsers with PackratParsers {
Some of the parsers are working. But the following one is getting tripped up:
def time = """([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)"""
Following is the input not working:
09:58:24.608891
On reaching that line we get:
[2.22] failure: `([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)' expected but `:' found
09:58:24.608891
Note: I did verify correct behavior of that regex within the scala repl on the same input pattern.
val r = """([\d]{2}):([\d]{2}):([\d]{2}\.[\d]+)""".r
val s = """09:58:24.608891"""
val r(t,t2,t3) = s
t: String = 09
t2: String = 58
t3: String = 24.608891
So.. AFA parser combinator: is there an issue with the ":" token itself - i.e. need to create my own custom Lexer and add ":" to lexical.delimiters?
Update an answer was provided to add ".r". I had already tried that- but in any case to be explicit: the following has the same behavior (does not work):
def time = """([\d]{2}:[\d]{2}:[\d]{2}.[\d]+)""".r
I think you're just missing an .r at the end here to actually have a Regex as opposed to a string literal.
def time = """([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)"""
it should be
def time = """([\d]{2}:[\d]{2}:[\d]{2}\.[\d]+)""".r
The first one expects the text to be exactly like the regex string literal (which obviously isn't present), the second one expects text that actually matches the regex. Both create a Parser[String], so it's not immediately obvious that something is missing.
There's an implicit conversion from java.lang.String to Parser[String], so that string literals can be used as parser combinators.
There's an implicit conversion from scala.util.matching.Regex to > Parser[String], so that regex expressions can be used as parser combinators.
http://www.scala-lang.org/files/archive/api/2.11.2/scala-parser-combinators/#scala.util.parsing.combinator.RegexParsers

Scala: line by line check for None, or check for exceptions at the end?

I have a case class Application that has some input ports, and each port has a name. Then I have another case class that assigns values to the ports of an application.
case class Port (id: ObjectId, name: String, PortType: String)
case class Application (id: ObjectId, ports: List[Port])
case class AppRun (appId: ObjectId, assignments: List[Assignment])
case class Assignment (portName: String, value: String, valueType: String)
I have the applications and their port information in a database, and I get as input an AppRun. I need to make a list of PortValue type (below) showing the value assigned to each port (and the matching is done on port names):
case class PortValue (portId: ObjectId, value: String)
There are a few things that may fail during this matching: application id is invalid, ports do not match, etc. It feels natural to me to write the straightforward algorithm and then catch all exceptions, but that seems Java-ish. On the other hand, I cannot think of a neat way of dealing with Options , checking them one by one, which will obfuscate the code.
The question is how would you solve this Scala way?
EDIT: I need to send a proper message back when such a mismatch happens, like "application not found", etc.
A way to deal with checking Options one by one is to use a for-comprehension. And if you want to keep track of errors you can quite often replace Option with some class that does error-tracking. The common possibilities include:
scala.util.Try[T]. Try is either a Success(result), or a Failure(error: Throwable). It is a built-in class in Scala, and it is simple to combine it with or replace it by scala.concurrent.Future if the need arises.
scala.util.Either[E, T]. Creating a Throwable for every error may not be very efficient because of the need to build the stacktrace. So Either is useful if the error can be a simple String or some application-specific class without the stacktrace. The convention is to have a Right(result) or a Left(error). The downsides are that it's not semantic to have 'right' mean 'success' and 'left' mean 'error', and when you use it in a for-comprehension or call e.g. map method on it, you have to specify whether you want either.right or either.left.
scalaz.\/[E, T] This is the same as Either, but the default for map and for-comprehension is its right side (\/-). Also scalaz provides very useful functions sequence and traverse (see the code below).
scalaz.Validation[Errors, T] or scalaz.ValidationNel[E, T]. Adds a very useful functionality of collecting all the errors, but has slight problems when used in for-comprehensions.
Here is some sample code for your problem, using Try:
import scala.util.{Try, Success, Failure}
def getApplication(appId: ObjectId): Option[Application] = ???
/** Convert Option to Try, using a given failure in case of None */
def toTry[T](option: Option[T])(failure: => Throwable): Try[T] =
option.fold[Try[T]](Failure(failure))(Success(_))
/** Convert a List of Try to a Try of List.
* If all tries in the List are Success, the result is Success.
* Otherwise the result is the first Failure from the list */
def sequence[T](tries: List[Try[T]]): Try[List[T]] =
tries.find(_.isFailure) match {
case Some(Failure(error)) => Failure(error)
case _ => Success(tries.map(_.get))
}
def traverse[T, R](list: List[T])(f: T => Try[R]): Try[List[R]] =
sequence(list map f)
def portValues(task: AppRun): Try[List[PortValue]] = for {
app <- toTry(getApplication(task.appId))(
new RuntimeException("application not found"))
portByName = app.ports.map(p => p.name -> p).toMap
ports <- traverse(task.assignments) { assignment =>
val tryPort = toTry(portByName.get(assignment.portName))(
new RuntimeException(s"no port named ${assignment.portName}"))
tryPort.map(port => PortValue(port.id, assignment.value))
}
} yield ports
Some considerations:
Provided implementations of toTry, sequence and traverse are just a sample. For one, I'd define them in implicit classes to be able to call them like normal methods (e.g. option.toTry(error), or list.traverse(f)).
traverse can be implemented more effectively (stop after the first error is found).
this sequence implementation would return only the first erroneous port.
I prefer API like def getApplication(id: ObjectId): Try[Application] instead of an Option result, because you usually want to have the same error in every part of the code that calls it, and it may give different errors as well (e.g., id not found or network error). If you have def getApplication(id: ObjectId): Application that may throw an error you can simply wrap it in Try: for { app <- Try(getApplication(id)) ...

What does the word "Action" do in a Scala function definition using the Play framework?

I am developing Play application and I've just started with Scala. I see that there is this word Action after the equals sign in the function below and before curly brace.
def index = Action {
Ok(views.html.index("Hi there"))
}
What does this code do? I've seen it used with def index = { but not with the word before the curly brace.
I would assume that the name of the function is index. But I do not know what the word Action does in this situation.
This word is a part of Play Framework, and it's an object, which has method apply(block: ⇒ Result), so your code is actually:
def index: Action[AnyContent] = Action.apply({
Ok.apply(views.html.index("Hi there"))
})
Your index method returns an instance of the class Action[AnyContent].
By the way, you're passing a block of code {Ok(...)} to apply method, which (block of code) is actually acts as anonymous function here, because the required type for apply's input is not just Result but ⇒ Result, which means that it takes an anonymous function with no input parameters, which returns Result. So, your Ok-block will be executed when container, received your instance of class Action (from index method), decided to execute this block. Which simply means that you're just describing an action here - not executing - it will be actually executed when Play received your request - and find binding to your action inside routing file.
Also, you don't have to use def here as you always return same action - val or lazy val is usually enough. You will need a def only if you actually want to pass some parameter from routing table (for instance):
GET /clients/:id controllers.SomeController.index(id: Long)
def index(id: Long) = Action { ... } // new action generated for every new request here
Another possible approach is to choose Action, based on parameter:
def index(id: Long) = {
if (id == 0) Action {...} else Action{...}
}
But uasually you can use routing table itself for that, which is better for decoupling. This example just shows that Action is nothing more than return value.
Update for #Kazuya
val method1 = Action{...} //could be def too, no big difference here
// this (code inside Action) gonna be called separately after "index" (if method2 is requested of course)
// notice that it needs the whole request, so it (request) should be completely parsed at the time
val method2 = Action{ req => // you can extract additional params from request
val param1 = req.headers("header1")
...
}
//This is gonna be called first, notice that Play doesn't need the whole request body here, so it might not even be parsed on this stage
def index(methodName: String) = methodName match {
case "method1" => method1
case "method2" => method2
}
GWT/Scala.js use simillar approach for client-server interaction. This is just one possible solution to explain importance of the parameter "methodName" passed from routing table. So, action could be thought as a wrapper over function that in its turn represents a reference to OOP-method, which makes it useful for both REST and RPC purposes.
The other answers deal with your specific case. You asked about the general case, however, so I'll attempt to answer from that perspective.
First off, def is used to define a method, not a function (better to learn that difference now). But, you're right, index is the name of that method.
Now, unlike other languages you might be familiar with (e.g., C, Java), Scala lets you define methods with an expression (as suggested by the use of the assignment operator syntax, =). That is, everything after the = is an expression that will be evaluated to a value each time the method is invoked.
So, whereas in Java you have to say:
public int three() { return 3; }
In Scala, you can just say:
def three = 3
Of course, the expression is usually more complicated (as in your case). It could be a block of code, like you're more used to seeing, in which case the value is that of the last expression in the block:
def three = {
val a = 1
val b = 2
a + b
}
Or it might involve a method invocation on some other object:
def three = Numbers.add(1, 2)
The latter is, in fact, exactly what's going on in your specific example, although it requires a bit more explanation to understand why. There are two bits of magic involved:
If an object has an apply method, then you can treat the object as if it were a function. You can say, for example, Add(1, 2) when you really mean Add.apply(1,2) (assuming there's an Add object with an apply method, of course). And just to be clear, it doesn't have to be an object defined with the object keyword. Any object with a suitable apply method will do.
If a method has a single by-name parameter (e.g., def ifWaterBoiling(fn: => Tea)), then you can invoke the method like ifWaterBoiling { makeTea }. The code in that block is evaluated lazily (and may not be evaluated at all). This would be equivalent to writing ifWaterBoiling({ makeTea }). The { makeTea } part just defines an expression that gets passed in, unevaluated, for the fn parameter.
Its the Action being called on with an expression block as argument. (The apply method is used under the hood).
Action.apply({
Ok("Hello world")
})
A simple example (from here) is as follows (look at comments in code):
case class Logging[A](action: Action[A]) extends Action[A] {
def apply(request: Request[A]): Result = {// apply method which is called on expression
Logger.info("Calling action")
action(request) // action being called on further with the request provided to Logging Action
}
lazy val parser = action.parser
}
Now you can use it to wrap any other action value:
def index = Logging { // Expression argument starts
Action { // Action argument (goes under request)
Ok("Hello World")
}
}
Also, the case you mentioned for def index = { is actually returning Unit like: def index: Unit = {.

Using `err` in a Child Parser

In the following Parser:
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = aSentence | something
lazy val aSentence = noun ~ verb ~ obj
lazy val noun = word("noun")
lazy val verb = word("verb") | err("not a verb!")
lazy val obj = word("object")
lazy val something = word("FOO")
}
It will parse noun verb object.
scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)
But, when entering a valid noun, but an invalid verb, why won't the err("not a verb!") return an Error with that particular error message?
scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found
noun vedsfasdf
^
credit: Thanks to Travis Brown for explaining the need for the word function here.
This question seems similar, but I'm not sure how to handle err with the ~ function.
Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse aSentence, it's then going to try something.
The culprit should be obvious when you think about it: what in that source code is taking two Failure results and choosing one? | (aka append).
This method on Parser will feed the input to both parsers, and then call append on ParseResult. That method is abstract at that level, and defined on Success, Failure and Error in different ways.
On both Success and Error, it always take this (that is, the parser on the left). On Failure, though, it does something else:
case class Failure(override val msg: String, override val next: Input) extends NoSuccess(msg, next) {
/** The toString method of a Failure yields an error message. */
override def toString = "["+next.pos+"] failure: "+msg+"\n\n"+next.pos.longString
def append[U >: Nothing](a: => ParseResult[U]): ParseResult[U] = { val alt = a; alt match {
case Success(_, _) => alt
case ns: NoSuccess => if (alt.next.pos < next.pos) this else alt
}}
}
Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing FOO), but if both have read the same amount, it will give precedence to the second failure.
I do wonder if it shouldn't check whether the right side is an Error, and, if so, return that. After all, if the left side is an Error, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.
Back to the problem, it would seem that it should have gone with err, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, including err.
That means that err's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:
lazy val verb = word("verb") | " *".r ~ err("not a verb!")
Arguably, err ought to be overridden by RegexParsers to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).