We are building a parser for a DSL that looks like SQL. Like in SQL it has multiple blocks namely select, where, order by, group by etc. We made some parts of the query like where, order by etc. optional. When we do this the parser ignores all input errors.
DSL Code Snippet
case class Query(select: Select, where: Where)
case class Select(cols: Seq[String])
case class Where(conditions: Seq[String])
object QueryLanguage extends StandardTokenParsers {
lazy val sql: Parser[Query] = select_block ~ (where_block?) ^^ {case slct ~ whr => Query(slct, whr.getOrElse(null))}
lazy val select_block: Parser[Select] = "SELECT" ~> rep1sep(ident, ",") ^^ { case cols => Select(cols) }
lazy val where_block: Parser[Where] = "WHERE" ~> rep1sep(ident, "and") ^^ { case conds => Where(conds) }
}
Any syntax errors in the select block are reported but not on other blocks. This is frustrating to someone who wants to have a where block in their query but they have no way to report errors in their definition.
For example, a SQL like shown below simply ignores the WHERE clause without reporting error (note and keyword is misspelled)
select A, B
WHERE a=1 andd b=2
The SQL parses correctly if I were to fix the spelling in the input or make the where caluse mandatory in the DSL like shown below
lazy val sql: Parser[Query] = select ~ where ~ (orderby?)
Is there another way to handle this or override this default behavior?
Related
I have a file, each row is a json array.
I reading each line of the file, and trying to convert the rows into a json array, and then for each element I am converting to a case class using json spray.
I have this so far:
for (line <- source.getLines().take(10)) {
val jsonArr = line.parseJson.convertTo[JsArray]
for (ele <- jsonArr.elements) {
val tryUser = Try(ele.convertTo[User])
}
}
How could I convert this entire process into a single line statement?
val users: Seq[User] = source.getLines.take(10).map(line => line.parseJson.convertTo[JsonArray].elements.map(ele => Try(ele.convertTo[User])
The error is:
found : Iterator[Nothing]
Note: I used Scala 2.13.6 for all my examples.
There is a lot to unpack in these few lines of code. First of all, I'll share some code that we can use to generate some meaningful input to play around with.
object User {
import scala.util.Random
private def randomId: Int = Random.nextInt(900000) + 100000
private def randomName: String = Iterator
.continually(Random.nextInt(26) + 'a')
.map(_.toChar)
.take(6)
.mkString
def randomJson(): String = s"""{"id":$randomId,"name":"$randomName"}"""
def randomJsonArray(size: Int): String =
Iterator.continually(randomJson()).take(size).mkString("[", ",", "]")
}
final case class User(id: Int, name: String)
import scala.util.{Try, Success, Failure}
import spray.json._
import DefaultJsonProtocol._
implicit val UserFormat = jsonFormat2(User.apply)
This is just some scaffolding to define some User domain object and come up with a way to generate a JSON representation of an array of such objects so that we can then use a JSON library (spray-json in this case) to parse it back into what we want.
Now, going back to your question. This is a possible way to massage your data into its parsed representation. It may not fit 100% what your are trying to do, but there's some nuance in the data types involved and how they work:
val parsedUsers: Iterator[Try[User]] =
for {
line <- Iterator.continually(User.randomJsonArray(4)).take(10)
element <- line.parseJson.convertTo[JsArray].elements
} yield Try(element.convertTo[User])
First difference: notice that I use the for comprehension in a form in which the "outcome" of an iteration is not a side effect (for (something) { do something }) but an actual value for (something) yield { return a value }).
Second difference: I explicitly asked for an Iterator[Try[User]] rather than a Seq[User]. We can go very down into a rabbit hole on the topic of why the types are what they are here, but the simple explanation is that a for ... yield expression:
returns the same type as the one in the first line of the generation -- if you start with a val ns: Iterator[Int]; for (n<- ns) ... you'll get an iterator at the end
if you nest generators, they need to be of the same type as the "outermost" one
You can read more on for comprehensions on the Tour of Scala and the Scala Book.
One possible way of consuming this is the following:
for (user <- parsedUsers) {
user match {
case Success(user) => println(s"parsed object $user")
case Failure(error) => println(s"ERROR: '${error.getMessage}'")
}
As for how to turn this into a "one liner", for comprehensions are syntactic sugar applied by the compiler which turns every nested call into a flatMap and the final one into map, as in the following example (which yields an equivalent result as the for comprehension above and very close to what the compiler does automatically):
val parsedUsers: Iterator[Try[User]] = Iterator
.continually(User.randomJsonArray(4))
.take(10)
.flatMap(line =>
line.parseJson
.convertTo[JsArray]
.elements
.map(element => Try(element.convertTo[User]))
)
One note that I would like to add is that you should be mindful of readability. Some teams prefer for comprehensions, others manually rolling out their own flatMap/map chains. Coders discretion is advised.
You can play around with this code here on Scastie (and here is the version with the flatMap/map calls).
I am following this tutorial on GraphQL with Sangria. I am wondering about the following line
val JsObject(fields) = requestJSON
where requestJSON is an object of JsValue. This way of assigning fields is new to me and my question is, if you could name that pattern or provide me with a link to a tutorial regarding this structure.
The important thing to know is that val definitions support a Pattern on the left-hand side of the assignment, thus providing (subset of the functionality of) Pattern Matching.
So, your example is equivalent to:
val fields = requestJSON match {
case JsObject(foo) => foo
}
See Scala Language Specification Section 4.1 Value Declarations and Definitions for details.
So, for example, if you have a list l and you want to assign the first element and the rest, you could write:
val x :: xs = l
Or, for the fairly common case where a method returns a tuple, you could write:
val (result1, result2) = foo()
It is the Extractor pattern, you can reach the same result implementing the unapply method on your arbitrary object (like shown in the example). When you create a case class the compiler produces an unapply method for you, so you can do:
case class Person(name : String, surname : String)
val person = Person("gianluca", "aguzzi")
val Person(name, surname) = person
I have been trying to build a SQL parser with the scala-parser-combinator library, which I've simplified greatly into the code below.
class Expression
case class FalseExpr() extends Expression
case class TrueExpr() extends Expression
case class AndExpression(expr1: Expression, expr2: Expression) extends Expression
object SimpleSqlParser {
def parse(sql: String): Try[Expression] = new SimpleSqlParser().parse(sql)
}
class SimpleSqlParser extends RegexParsers {
def parse(sql: String): Try[_ <: Expression] = parseAll(expression, sql) match {
case Success(matched,_) => scala.util.Success(matched)
case Failure(msg,remaining) => scala.util.Failure(new Exception("Parser failed: "+msg + "remaining: "+ remaining.source.toString.drop(remaining.offset)))
case Error(msg,_) => scala.util.Failure(new Exception(msg))
}
private def expression: Parser[_ <: Expression] =
andExpr | falseExpr | trueExpr
private def falseExpr: Parser[FalseExpr] =
"false" ^^ (_ => FalseExpr())
private def trueExpr: Parser[TrueExpr] = "true" ^^ (_ => TrueExpr())
private def andExpr: Parser[Expression] =
expression ~ "and" ~ expression ^^ { case e1 ~ and ~ e2 => AndExpression(e1,e2)}
}
Without the 'and' parsing, it works fine. But I want to be able to parse things like 'true AND (false OR true)', for example. When I add the 'and' part to the definition of an expression, I get a StackOverflowError, the stack is alternating between the definitions of 'and' and 'expression'.
I understand why this is happening - the definition of expression begins with and, and vice-versa. But this seems like the most natural way to model this problem. In reality an expression could also be LIKE, EQUALS etc. Is there another way to model this kind of thing in general in order to get around the problem of recursive definitions.
scala.util.parsing.combinator.RegexParsers cannot handle left-recursive grammars. Your grammar can be summarized by the following production rules:
expression -> andExpr | falseExpr | trueExpr
...
andExpr -> expression "and" expression
expression is indirectly left-recursive via andExpr.
To avoid the infinite recursion, you need to reformulate the grammar so that it is not left-recursive anymore. One frequently-used way is to use repetition combinators, such as chainl1:
private def expression: Parser[_ <: Expression] =
chainl1(falseExpr | trueExpr, "and" ^^^ { AndExpression(_, _) })
Live on Scastie
The new expression matches one or more falseExpr/trueExpr, separated by "and", and combines the matched elements with AndExpression in a left-associative way. Conceptually, it corresponds to the following production rule:
expression -> (falseExpr | trueExpr) ("and" (falseExpr | trueExpr))*
If your grammar contains many tangled left-recursive production rules, you might want to consider other parser combinator libraries, such as GLL combinators, that directly support left recursion.
In the following Parser:
object Foo extends JavaTokenParsers {
def word(x: String) = s"\\b$x\\b".r
lazy val expr = aSentence | something
lazy val aSentence = noun ~ verb ~ obj
lazy val noun = word("noun")
lazy val verb = word("verb") | err("not a verb!")
lazy val obj = word("object")
lazy val something = word("FOO")
}
It will parse noun verb object.
scala> Foo.parseAll(Foo.expr, "noun verb object")
res1: Foo.ParseResult[java.io.Serializable] = [1.17] parsed: ((noun~verb)~object)
But, when entering a valid noun, but an invalid verb, why won't the err("not a verb!") return an Error with that particular error message?
scala> Foo.parseAll(Foo.expr, "noun vedsfasdf")
res2: Foo.ParseResult[java.io.Serializable] =
[1.6] failure: string matching regex `\bverb\b' expected but `v' found
noun vedsfasdf
^
credit: Thanks to Travis Brown for explaining the need for the word function here.
This question seems similar, but I'm not sure how to handle err with the ~ function.
Here's another question you might ask: why isn't it complaining that it expected the word "FOO" but got "noun"? After all, if it fails to parse aSentence, it's then going to try something.
The culprit should be obvious when you think about it: what in that source code is taking two Failure results and choosing one? | (aka append).
This method on Parser will feed the input to both parsers, and then call append on ParseResult. That method is abstract at that level, and defined on Success, Failure and Error in different ways.
On both Success and Error, it always take this (that is, the parser on the left). On Failure, though, it does something else:
case class Failure(override val msg: String, override val next: Input) extends NoSuccess(msg, next) {
/** The toString method of a Failure yields an error message. */
override def toString = "["+next.pos+"] failure: "+msg+"\n\n"+next.pos.longString
def append[U >: Nothing](a: => ParseResult[U]): ParseResult[U] = { val alt = a; alt match {
case Success(_, _) => alt
case ns: NoSuccess => if (alt.next.pos < next.pos) this else alt
}}
}
Or, in other words, if both sides have failed, then it will take the side that read the most of the input (which is why it won't complain about a missing FOO), but if both have read the same amount, it will give precedence to the second failure.
I do wonder if it shouldn't check whether the right side is an Error, and, if so, return that. After all, if the left side is an Error, it always return that. This look suspicious to me, but maybe it's supposed to be that way. But I digress.
Back to the problem, it would seem that it should have gone with err, as they both consumed the same amount of input, right? Well... Here's the thing: regex parsers skip whiteSpace first, but that's for regex literals and literal strings. It does not apply over all other methods, including err.
That means that err's input is at the whitespace, while the word's input is at the word, and, therefore, further on the input. Try this:
lazy val verb = word("verb") | " *".r ~ err("not a verb!")
Arguably, err ought to be overridden by RegexParsers to do the right thing (tm). Since Scala Parser Combinators is now a separate project, I suggest you open an issue and follow it up with a Pull Request implementing the change. It will have the impact of changing error messages for some parser (well, that's the whole purpose of changing it :).
I am writing a DSL using Scala parser combinators and have a working version that can read a single file and parse it. However, I would like to split my input into several files where some files are 'standard' and can be used with any top-level file. What I would like is something like:
import "a.dsl"
import "b.dsl"
// rest of file using {a, b}
It isn't important what order the files are read in or that something is necessarily 'defined' before being referred to so parsing the top-level file first then parsing the closure of all imports into a single model is sufficient. I will then post-process the resulting model for my own purposes.
The question I have is, is there a reasonable way of accomplishing this? If necessary I could iterate over the closure, parsing each file into a separate model, and manually 'merge' the resulting models but this feels clunky and seems ugly to me.
BTW, I am using an extension of StandardTokenParsers, if that matters.
I think the only approach would be to open and parse the file indicated by the import directly. From there you can create a sub-expression tree for the module. You may not need to manually merge the trees when parsing, for example if your already using ^^ and/or ^^^ to return your own Expressions then you should be able to simply emit a relevant expression type in the correct place within the tree, for example:
import scala.util.parsing.combinator.syntactical.StandardTokenParsers
import scala.io.Source
object Example {
sealed trait Expr
case class Imports(modules: List[Module]) extends Expr
case class Module(modulePath: String, root: Option[Expr]) extends Expr
case class BracedExpr(x: String, y: String) extends Expr
case class Main(imports: Imports, braced: BracedExpr) extends Expr
class BlahTest extends StandardTokenParsers {
def importExpr: Parser[Module] = "import" ~> "\"" ~> stringLit <~ "\"" ^^ {
case modulePath =>
//you could use something other than `expr` below if you
//wanted to limit the expressions available in modules
//e.g. you could stop one module importing another.
phrase(expr)(new lexical.Scanner(Source.fromFile(modulePath).mkString)) match {
case Success(result, _) =>
Module(modulePath, Some(result))
case failure : NoSuccess =>
//TODO log or act on failure
Module(modulePath, None)
}
}
def prologExprs = rep(importExpr) ^^ {
case modules =>
Imports(modules)
}
def bracedExpr = "{" ~> stringLit ~ "," ~ stringLit <~ "}" ^^ {
case x ~ "," ~ y =>
BracedExpr(x, y)
}
def bodyExprs = bracedExpr
def expr = prologExprs ~ bodyExprs ^^ {
case prolog ~ body =>
Main(prolog, body)
}
}
}
You could simply add an eval to your Expression trait, implement each eval as necessary on the sub-classes and then have a visitor recursively descend your AST. In this manner you would not need to manually merge expression trees together.