I am developing lexical analysis for my program language. I want to produce the fail string which have open quote but dont have close quote. Ex: "hello
class SimpleLexer extends StdLexical {
import scala.util.parsing.input.CharArrayReader.EofCh
def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
(r findPrefixMatchOf (source.subSequence(offset, source.length))) match {
case Some(matched) =>
Success(source.subSequence(offset, offset + matched.end).toString,
in.drop(matched.end))
case None =>
Failure("string matching regex `" + r + "' expected but `" + in.first + "' found", in.drop(0))
}
}
}
override def token: Parser[Token] = {
// Adapted from StdLexical
(
'\"' ~ rep( chrExcept('\"', '\n','\t','\b','\f','\r', EofCh) ) ~ '\"' ^^ { case '\"' ~ chars ~ '\"' => StringLit(chars mkString "") }
|'\"' ~> failure("Unclosed string: "+"??") // I want produce fail string
|EofCh ^^^ EOF
|delim
)
}
override def whitespace: Parser[Any] = rep(
whitespaceChar
| '/' ~ '*' ~ comment
| '/' ~ '*' ~> failure("unclosed comment"))
override protected def comment: Parser[Any] = (
'*' ~ '/' ^^ { case _ => ' ' }
| chrExcept(EofCh) ~ comment)
}
Excample:
input: " safs i
output: ErrorToken(Unclosed string: " safs i)
Can you help me solve this problem.
Thanks.
My answer for your question
override def token: Parser[Token] = {
// Adapted from StdLexical
(
'\"' ~ rep( chrExcept('\"', '\n','\t','\b','\f','\r', EofCh) ) ~ '\"' ^^ { case '\"' ~ chars ~ '\"' => StringLit(chars mkString "") }
|'\"' ~> rep( chrExcept('\"', '\n','\t','\b','\f','\r', EofCh) ) ^^ {chars => ErrorToken(("\"" :: chars) mkString "")}
|EofCh ^^^ EOF
|delim
)
}
Related
I have a combinator and a result converter that looks like so:
// parses a line like so:
//
// 2
// 00:00:01.610 --> 00:00:02.620 align:start position:0%
//
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time ~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ _ ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
Because the arrow, textline and eol are not important to my result converter, I was hoping I could use <~ and ~> in the right places within my combinator such that my converter doesn't have to deal with them. As an experiment, I changed the first ~ in the parser to <~ and removed the ~ _ where the "arrow" would be matched in the case statement like so:
private def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber ~ whiteSpace).? ~>
time <~ arrow ~ time ~ opt(textLine) ~ eol
} ^^ {
case
startTime ~ endTime ~ _ ~ _
=> SubtitleBlock(startTime, endTime, List(""))
}
However, I get red-squigglies in IntelliJ with the error message:
Error:(44, 31) constructor cannot be instantiated to expected type;
found : caption.vttdissector.VttParsers.~[a,b] required: Int
startTime ~ endTime ~ _ ~ _
What am I doing wrong?
Since you didn't insert any parentheses in the chain of ~ and <~, most matched subexpressions are thrown out "with the bathwater" (or rather "with the whitespace and arrows"). Just insert some parentheses.
Here is the general pattern what it should look like:
(irrelevant ~> irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
(irrelevant ~> RELEVANT <~ irrelevant <~ irrelevant) ~
...
i.e. every "relevant" subexpression is surrounded by irrelevant stuff and a pair of parentheses, and then the parenthesized subexpressions are connected by ~'s.
Your example:
import scala.util.parsing.combinator._
import scala.util.{Either, Left, Right}
case class SubtitleBlock(startTime: String, endTime: String, text: List[String])
object YourParser extends RegexParsers {
def subtitleHeader: Parser[SubtitleBlock] = {
(subtitleNumber.? ~> time <~ arrow) ~
time ~
(opt(textLine) <~ eol)
} ^^ {
case startTime ~ endTime ~ _ => SubtitleBlock(startTime, endTime, Nil)
}
override val whiteSpace = "[ \t]+".r
def arrow: Parser[String] = "-->".r
def subtitleNumber: Parser[String] = "\\d+".r
def time: Parser[String] = "\\d{2}:\\d{2}:\\d{2}.\\d{3}".r
def textLine: Parser[String] = ".*".r
def eol: Parser[String] = "\n".r
def parseStuff(s: String): scala.util.Either[String, SubtitleBlock] =
parseAll(subtitleHeader, s) match {
case Success(t, _) => scala.util.Right(t)
case f => scala.util.Left(f.toString)
}
def main(args: Array[String]): Unit = {
val examples: List[String] = List(
"2 00:00:01.610 --> 00:00:02.620 align:start position:0%\n"
) ++ args.map(_ + "\n")
for (x <- examples) {
println(parseStuff(x))
}
}
}
finds:
Right(SubtitleBlock(00:00:01.610,00:00:02.620,List()))
I am writing a parser combinator to parse simple control flow statements
and execute some code. The structure of language is roughly this -
val resultId = 200
val s = s"""(IF $resultId == 100 GOTO NODE-1-->NODE-2) (ELSE IF $resultId > 100 GOTO NODE-1-->NODE-3) (ELSE GOTO NODE-1-->NODE-4)""".stripMargin
private val result= new ConditionalParserCombinator().run(s)
In above scenario for example I should get GOTO NODE-1-->NODE-3 instead I get false after evaluation of else expression, code of combinator outlined below:
final class ConditionalParserCombinator extends JavaTokenParsers with ParserCombinatorLike {
def IF = "IF"
def ELSE = "ELSE"
def ELSEIF = ELSE ~ IF
def NULL = "NULL"
def GOTO = "GOTO"
def node_id = wholeNumber | floatingPointNumber | stringLiteral
def NODE = "NODE" ~ "-" ~ node_id ^^ (e ⇒ NodeExpression(e._2))
def EDGE = NODE ~ "-->" ~ NODE ^^ (e ⇒ EdgeExpression(e._1._1, e._2))
def lhs = ident | wholeNumber | floatingPointNumber | stringLiteral
def rhs = ident | wholeNumber | floatingPointNumber | stringLiteral | NULL
def operator = "==" | "*" | "/" | "||" | "&&" | ">" | "<" | ">=" | "<="
def block = GOTO ~ EDGE
def expression_block = lhs ~ operator ~ rhs ~ block ^^ {
case lhs ~ operator ~ rhs ~ block ⇒ ExpressionBlock(lhs, rhs, operator, block._2)
}
def ifExpression = IF ~ expression_block ^^ (e ⇒ e._2.operator match {
case "==" ⇒ if (e._2.lhs == e._2.rhs) Block(e._2.block) else false
case ">" ⇒ if (e._2.lhs > e._2.rhs) Block(e._2.block) else false
case "<" ⇒ if (e._2.lhs < e._2.rhs) Block(e._2.block) else false
case _ ⇒ false
})
def elseIFExpression = ELSEIF ~ expression_block ^^ (e ⇒ e._2.operator match {
case "==" ⇒ if (e._2.lhs == e._2.rhs) Block(e._2.block) else false
case ">" ⇒ if (e._2.lhs > e._2.rhs) {
println("matched elseif")
Block(e._2.block)
} else false
case "<" ⇒ if (e._2.lhs < e._2.rhs) Block(e._2.block) else false
case _ ⇒ false
})
def elseExpression = ELSE ~ block ^^ (e ⇒ Block(e._2._2))
override def grammar = "(" ~> log(ifExpression)("ifexpression") <~ ")" ~!
"(" ~> log(elseIFExpression)("elseifexpression") <~ ")" ~!
"(" ~> log(elseExpression)("elseexpression") <~ ")"
}
I am printing result.get and I see false as the result.
** Additional details - Block, ExpressionBlock are all case classes useful for a few things that I may do later on**
I think its cleaner to parse an expression to a type that you can understand (meaning I have custom Product/Case classes defined for it) and then Evaluate it - these are two different things. In hindsight not sure why I got both mixed up. Here's the logic that works -
def IF = "IF"
def ELSE = "ELSE"
def ELSEIF = ELSE ~ IF
def NULL = "NULL"
def GOTO = "GOTO"
def dataType: Parser[DataType] = "[" ~ "Integer" ~ "]" ^^ { e ⇒ DataType("", "Integer") }
def node_id = wholeNumber | floatingPointNumber | stringLiteral
def NODE = "NODE" ~ "-" ~ node_id ^^ (e ⇒ ParseableNode(e._2, DataType({}, "Unit")))
def EDGE = NODE ~ "-->" ~ NODE ^^ (e ⇒ EdgeExpression(e._1._1, e._2))
def lhs = ident | wholeNumber | floatingPointNumber | stringLiteral
def rhs = ident | wholeNumber | floatingPointNumber | stringLiteral | NULL
def operator = "==" | "*" | "/" | "||" | "&&" | ">" | "<" | ">=" | "<="
def block = GOTO ~ EDGE
def expression_block(expType: ConditionalKind) = dataType ~ lhs ~ operator ~ rhs ~ block ^^ {
case dataType ~ lhs ~ operator ~ rhs ~ block ⇒ ExpressionBlock(ParseableNode(lhs, dataType), ParseableNode(rhs, dataType), operator, block._2, expType)
}
def ifExpression = IF ~ expression_block(ConditionalKind("IF")) ^^ {
case "IF" ~ expression_block ⇒ ExpressionBlock(expression_block.lhs, expression_block.rhs, expression_block.operator, expression_block.block, expression_block.conditionalKind)
}
def elseIFExpression = ELSEIF ~ expression_block(ConditionalKind("ELSEIF")) ^^ {
case "ELSE" ~ "IF" ~ expression_block ⇒ ExpressionBlock(expression_block.lhs, expression_block.rhs, expression_block.operator, expression_block.block, expression_block.conditionalKind)
}
def elseExpression = ELSE ~ block ^^ { case "ELSE" ~ block ⇒ Block(block._2) }
override def grammar = log(ifExpression)("ifexpression") ~ log(elseIFExpression)("elseifexpression") ~ log(elseExpression)("elseexpression") ^^ {
case ifExpression ~ elseIFExpression ~ elseExpression ⇒
ConditionalExpressions(List(ifExpression, elseIFExpression), elseExpression)
}
The above logic works after being evaluated like this -
object BasicSelectorExpressionEvaluator extends EvaluatorLike {
override def eval(parseable: Parseable) = parseable match {
case ConditionalExpressions(ifElseIfs, otherwise) ⇒
val mappedIfElseIfs: immutable.Seq[Block] = ifElseIfs.map { e ⇒
println(s"e ==>$e")
e.operator match {
case "==" ⇒ if (e.lhs == e.rhs) {
println("mached ==")
Block(e.block)
} else Block.Unit
case "<" ⇒ if (e.lhs.value.toInt < e.rhs.value.toInt) {
println("matched <")
Block(e.block)
} else Block.Unit
case ">" ⇒ if (e.lhs.value.toInt > e.rhs.value.toInt) {
println("matched >")
Block(e.block)
} else Block.Unit
case "<=" ⇒ if (e.lhs.value.toInt <= e.rhs.value.toInt) {
println("mached <=")
Block(e.block)
} else Block.Unit
case ">=" ⇒ if (e.lhs.value.toInt >= e.rhs.value.toInt) {
println("mached >=")
Block(e.block)
} else Block.Unit
}
}
val filteredMappedIFElseIfs = mappedIfElseIfs.filterNot(e ⇒ e.equals(Block.Unit))
println(s"filteredMappedIFElseIfs == $filteredMappedIFElseIfs")
if (filteredMappedIFElseIfs.nonEmpty) PResult(filteredMappedIFElseIfs.head.block) else PResult(otherwise.block)
}
}
So the above can parse this grammar -
val s = s""" IF [Integer] $resultId == 100 GOTO NODE-1-->NODE-2 ELSE IF [Integer] $resultId > 100 GOTO NODE-1-->NODE-3 ELSE GOTO NODE-1-->NODE-4""".stripMargin
It could be done better, e.g. grammar seems to violate DRY by embedding data types on every If, but I suppose people can derive things out of it.
Edit - Also note - this toInt thing is a bit ugly, needs to be better designed, I will maybe post an update once I do so. I need to rework all grammar now that it all works - suggestions/improvements welcome, still learning.
I am writing a Parser in scala and got stuck at this point:
private def expression : Parser[Expression] = cond | variable | integer | liste | function
private def cond : Parser[Expression] = "if" ~ predicate ~ "then" ~ expression ~ "else" ~ expression ^^ {case _~i~_~t~_~el => Cond(i,t,el)}
private def predicate: Parser[Predicate] = identifier ~ "?" ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~_~el~_ => Predicate(n,el)}
private def function: Parser[Expression] = identifier ~ "(" ~ repsep(expression, ",") ~ ")" ^^{case n~_~el~_ => Function(n,el)}
private def liste: Parser[Expression] = "[" ~ repsep(expression, ",") ~ "]" ^^ {case _~ls~_ => Liste(ls)}
private def variable: Parser[Expression] = identifier ^^ {case v => Variable(v)}
def identifier: Parser[String] = """[a-zA-Z0-9]+""".r ^^ { _.toString }
def integer: Parser[Integer] = num ^^ { case i => Integer(i)}
def num: Parser[String] = """(-?\d*)""".r ^^ {_.toString}
My problem is that when it comes to an "expression" the Parser does not always takes the right way. Like if its funk(x,y) it tries to parse it like a variable ant not like a function.
Any idea?
Change order of parsers in your expression parser - put function before variable and after cond. In general, when you compose parsers using alternative A | B, then parser A shouldn't be able to parse input that is prefix of input parsable by parser B.
I'm new in scala. I have a problem in string literal. Here's my code:
import scala.util.matching.Regex
import scala.util.parsing.combinator.lexical.StdLexical
import scala.util.parsing.combinator.token.StdTokens
import scala.util.parsing.input.CharArrayReader.EofCh
trait SimpleTokens extends StdTokens {
// Adapted from StdTokens
case class FloatLit(chars: String) extends Token {
override def toString = "FloatLit "+chars
}
case class IntLit(chars: String) extends Token {
override def toString = "IntLit "+chars
}
case class BooleanLit(chars: String) extends Token {
override def toString = "BooleanLit " + chars
}
case class StrLit(chars: String) extends Token {
override def toString = "\"" + chars.slice(1,chars.length-1) + "\""
}
}
class SimpleLexer extends StdLexical with SimpleTokens {
import scala.util.parsing.input.CharArrayReader.EofCh
reserved ++= List( "mod", "div", "array","if","then")
delimiters ++= List( ";", "(", ")", "+", "-", "*", "/",".")
def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
(r findPrefixMatchOf (source.subSequence(offset, source.length))) match {
case Some(matched) =>
Success(source.subSequence(offset, offset + matched.end).toString,
in.drop(matched.end))
case None =>
Failure("string matching regex `" + r + "' expected but `" + in.first + "' found", in.drop(0))
}
}
}
override def token: Parser[Token] = {
// Adapted from StdLexical
(
regex("true|false".r) ^^ { BooleanLit(_)}
|regex("[a-z][a-z]*".r) ^^ { processIdent(_) }
|regex("""([0-9]*)(((\.)?[0-9]+(e|E)(\+|-)?[0-9]+)|(\.[0-9]+))""".r) ^^ { FloatLit(_) }
|regex("\\d+".r) ^^ { IntLit(_) }
|regex("""'([^'\"]|'')*'""".r) ^^ {StrLit(_) }
|EofCh ^^^ EOF
|delim
)
}
override def whitespace: Parser[Any] = rep(
whitespaceChar
| '/' ~ '*' ~ comment
| '/' ~ '*' ~> failure("unclosed comment"))
override protected def comment: Parser[Any] = (
'*' ~ '/' ^^ { case _ => ' ' }
| chrExcept(EofCh) ~ comment)
}
Input is: 'this is string
Output should be: ErrorToken(Unclosed string: 'This is string)
But when I run, I receive this:
ErrorToken '
identifier this
identifier is
identifier string
What do I have to do to get the correct Output? Please help me!!! Thanks for considering my problem
You can modify on Regex:
|regex("""(')[^']*(\t)""".r) ^^ ("Illegal tab in string:"+_) ^^ ErrorToken
|regex("""(')[^']*""".r) ^^ ("Unclosed string:"+_) ^^ ErrorToken
PPL-2012?
I am using the following object to parse csv. The parser seems to be working correctly except spaces are being stripped out. Could someone help me figure out where that is happening. Thanks
object CsvParser extends RegexParsers {
override protected val whiteSpace = """[ \t]""".r
def COMMA = ","
def DQUOTE = "\""
def DQUOTE2 = "\"\"" ^^ { case _ => "\"" }
def CR = "\r"
def LF = "\n"
def CRLF = "\r\n"
def TXT = "[^\",\r\n]".r
def record: Parser[List[String]] = rep1sep(field, COMMA)
def field: Parser[String] = (escaped | nonescaped)
def escaped: Parser[String] = (DQUOTE ~> ((TXT | COMMA | CR | LF | DQUOTE2)*) <~ DQUOTE) ^^ { case ls => ls.mkString("") }
def nonescaped: Parser[String] = (TXT*) ^^ { case ls => ls.mkString("") }
def parse(s: String) = parseAll(record, s) match {
case Success(res, _) => res
case _ => List[List[String]]()
}
}
I think I figured it out. Needed to add:
override val skipWhitespace = false