Scala parser combinator for Logo list? - scala
I am trying to make a token based scala parser for UCB Logo. The problem I am facing is that in Logo any expression that lists in UCB Logo values in a list can be delimited by one of ']', '[', ' '. If there are any other kinds of delimiters the content in the list should be treated as a word.
In short, how can I make a token parser that will consider the following:
[ 4 3 2 ] - should be a list
[ [ 4 3 2 ] ] - should be a list within a list
[ 1 + 2 ] - should be a word inside a list
[ [ 1 2 3 ] + ] - should be a word inside a list
The following
'[' ~ rep(chrExcept('[', ']')) ~ ']'
produces these tokens:
Tokens: List([, [1 2 3], +, ])
from [ [ 1 2 3 ] + ]. I believe it should produce the tokens:
List([, [1 2 3] +, ]) -> merge the + sign with the token [1 2 3].
This is the current code of the Lexical I am using:
package lexical
import scala.language.postfixOps
import scala.util.parsing.combinator.lexical.Lexical
import scala.util.parsing.input.CharSequenceReader._
/**
* Created by Marin on 28/03/16.
*/
class MyLexical extends Lexical with MyTokens {
def token: Parser[Token] = (
//procDef ^^ { case first ~ chars => processNewProcedure(chars mkString "") }
word2 ^^ { case rest => {
/*val s = if (second.isEmpty) "" else second mkString ""
val t = if(third.isEmpty) "" else third mkString ""
val f = if(fourth.isEmpty) "" else fourth mkString ""
StringLit(s"$first$s$t$f$rest")*/
println(rest)
StringLit("Smth")
}
}
| formalChar ~ rep(identChar | digit) ^^ { case first ~ rest => Formal(first :: rest mkString "") }
| identChar ~ rep(identChar | digit) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| procDigit ^^ { case first ~ second ~ rest => NumericLit((first mkString "") :: second.getOrElse("") :: rest mkString "") }
| '\"' ~ rep(chrExcept('\"', EofCh)) ~ ' ' ^^ { case '\"' ~ chars ~ ' ' => StringLit(chars mkString "") }
| EofCh ^^^ EOF
| delim
| failure("Illegal character")
)
def processNewProcedure(chars: String) =
if(reserved.contains(chars)) throw new RuntimeException
else {
Identifier(chars)
}
def procDef = toSeq ~> identChar ~ rep(identChar | elem('_')) <~ formalChar.* <~ endSeq
def toSeq = 't' ~ 'o' ^^^ "to"
def endSeq = 'e' ~ 'n' ~ 'd' ^^^ "end"
def processIdent(name: String) = {
if (reserved contains name) {
Keyword(name)
} else {
Identifier(name)
}
}
def word = {
'[' ~ ((whitespaceChar | digit)*) ~ (_delim | identChar) ~ rep(whitespaceChar | digit) ~ ']'
}
def word2 = {
//'[' ~> rep(whitespaceChar | digit) ~> rep(_delim | identChar) <~ rep(whitespaceChar | digit) <~ ']'
//'[' ~ rep(chrExcept('[', ']')) ~ ']'
rep1('[') ~ rep1(chrExcept('[', ']') | digit) ~ rep(_delim) ~ rep1(']')
//rep1('[') ~ identChar ~ rep(']') ~ rep('+') ~ rep1(']')
//'[' ~ (_delim | chrExcept('[', ']')) ~ ']'
}
def word3 = {
'[' ~> rep(digit | letter | _delim) <~ ']'
}
def procDigit = digit.+ ~ '.'.? ~ digit.*
def identChar = letter | elem('_')
def formalChar = ':' ~ identChar
override def whitespace: Parser[Any] = rep[Any] (
whitespaceChar
| ';' ~ comment
)
def comment: Parser[Any] = rep(chrExcept(EofCh, ';')) ^^ { case _ => ' ' }
/****** Pure copy-paste ******/
/** The set of reserved identifiers: these will be returned as `Keyword`s. */
val reserved = new scala.collection.mutable.HashSet[String]
/** The set of delimiters (ordering does not matter). */
val delimiters = new scala.collection.mutable.HashSet[String]
private lazy val _delim: Parser[Token] = {
// construct parser for delimiters by |'ing together the parsers for the individual delimiters,
// starting with the longest one -- otherwise a delimiter D will never be matched if there is
// another delimiter that is a prefix of D
def parseDelim(s: String): Parser[Token] = accept(s.toList) ^^ { x => Keyword(s) }
val d = new Array[String](delimiters.size)
delimiters.copyToArray(d, 0)
scala.util.Sorting.quickSort(d)
(d.toList map parseDelim).foldRight(failure("no matching delimiter"): Parser[Token])((x, y) => y | x)
}
protected def delim: Parser[Token] = _delim
}
Related
scala replace function with camma as ","
The below Input i have to replace last comma (,) with "," between two colons(:) println(input) //[level:1,File:one,three,Flag:NA][level:1,File:two,Flag:NA] println(input.replace(",", "\",\"")) getting result as: //[level:1","File:one","three","Flag:NA][level:1","File:two","Flag:NA] expected result should be [level:1","File:one,three","Flag:NA][level:1","File:two","Flag:NA] Kindly help me.
val str1 = "[level:1,File:one,three,Flag:NA][level:1,File:two,Flag:NA]" val regex1 = raw"(,)(\w+:)".r val matches = regex1.findAllMatchIn(str1) val str2 = matches.foldLeft(str1)({ case (str, m) => str.replaceFirst(m.group(0), "\",\"" + m.group(2)) }) // str2: String = [level:1","File:one,three","Flag:NA][level:1","File:two","Flag:NA]
Scala: How to apply the pattern matching function to a csv file which contains the below mentioned values
def matchcase(x:String):Int = x match{ case "Iris-setosa" => 10 case "Iris-virginica" => 20 case "Iris-versicolor"=> 30 case _ => 0 The sample data in the csv file is as below: 1,5.1,3.5,1.4,0.2,Iris-setosa 2,4.9,3,1.4,0.2,Iris-setosa 3,4.7,3.2,1.3,0.2,Iris-setosa 51,7,3.2,4.7,1.4,Iris-versicolor 52,6.4,3.2,4.5,1.5,Iris-versicolor 53,6.9,3.1,4.9,1.5,Iris-versicolor 103,7.1,3,5.9,2.1,Iris-virginica 104,6.3,2.9,5.6,1.8,Iris-virginica 105,6.5,3,5.8,2.2,Iris-virginica
In your comment you've mentioned Actually i want to replace the text in the whole file by using the matchcase function And assuming that you have data and function as mentioned in the question, you can use scala.io.Source.fromFile to read the file val data = scala.io.Source.fromFile("input file path") call the matchcase function you have written val replacedData = data.getLines().map(_.split(",")).map(array => array.init.mkString(",")+","+matchcase(array.last)) and finally write the output new PrintWriter("path to output file"){write(replacedData.mkString("\n")); close} You should have a file with following data 1,5.1,3.5,1.4,0.2,10 2,4.9,3,1.4,0.2,10 3,4.7,3.2,1.3,0.2,10 51,7,3.2,4.7,1.4,30 52,6.4,3.2,4.5,1.5,30 53,6.9,3.1,4.9,1.5,30 103,7.1,3,5.9,2.1,20 104,6.3,2.9,5.6,1.8,20 105,6.5,3,5.8,2.2,20 I hope the answer is helpful
Use extractor pattern in Scala with regex Given regex val regex = "[[0-9]|,|\\.]+([[a-zA-z]|\\-]+)".r Now Just pattern match the whole text line by line lines.map { case regex(name) => Some(name) case _ => None } later use your matchCase function to convert strings (names) into numbers and replace the name with number Code you looking for lines.map { line => line match { case regex(name) => line.replace("[[a-zA-z]|\\-]+", matchcase(name).toString) case _ => line } } Scala REPL scala> val regex = "[[0-9]|,|\\.]+([[a-zA-z]|\\-]+)".r regex: scala.util.matching.Regex = [[0-9]|,|\.]+([[a-zA-z]|\-]+) scala> "1,5.1,3.5,1.4,0.2,Iris-setosa" match { case regex(str) => println(s"name: $str")} name: Iris-setosa Next Processing whole text scala> val text = """ | 1,5.1,3.5,1.4,0.2,Iris-setosa | 2,4.9,3,1.4,0.2,Iris-setosa | 3,4.7,3.2,1.3,0.2,Iris-setosa | 51,7,3.2,4.7,1.4,Iris-versicolor | 52,6.4,3.2,4.5,1.5,Iris-versicolor | 53,6.9,3.1,4.9,1.5,Iris-versicolor | 103,7.1,3,5.9,2.1,Iris-virginica | 104,6.3,2.9,5.6,1.8,Iris-virginica | 105,6.5,3,5.8,2.2,Iris-virginica | """.stripMargin text: String = " 1,5.1,3.5,1.4,0.2,Iris-setosa 2,4.9,3,1.4,0.2,Iris-setosa 3,4.7,3.2,1.3,0.2,Iris-setosa 51,7,3.2,4.7,1.4,Iris-versicolor 52,6.4,3.2,4.5,1.5,Iris-versicolor 53,6.9,3.1,4.9,1.5,Iris-versicolor 103,7.1,3,5.9,2.1,Iris-virginica 104,6.3,2.9,5.6,1.8,Iris-virginica 105,6.5,3,5.8,2.2,Iris-virginica " scala> val lines = text.split("\n").filter(_.trim.nonEmpty) lines: Array[String] = Array(1,5.1,3.5,1.4,0.2,Iris-setosa, 2,4.9,3,1.4,0.2,Iris-setosa, 3,4.7,3.2,1.3,0.2,Iris-setosa, 51,7,3.2,4.7,1.4,Iris-versicolor, 52,6.4,3.2,4.5,1.5,Iris-versicolor, 53,6.9,3.1,4.9,1.5,Iris-versicolor, 103,7.1,3,5.9,2.1,Iris-virginica, 104,6.3,2.9,5.6,1.8,Iris-virginica, 105,6.5,3,5.8,2.2,Iris-virginica) scala> lines.map { | case regex(name) => Some(name) | case _ => None | } res18: Array[Option[String]] = Array(Some(Iris-setosa), Some(Iris-setosa), Some(Iris-setosa), Some(Iris-versicolor), Some(Iris-versicolor), Some(Iris-versicolor), Some(Iris-virginica), Some(Iris-virginica), Some(Iris-virginica)) Now use collect and only collect values which are Some scala> res18.collect { case Some(value) => value } res19: Array[String] = Array(Iris-setosa, Iris-setosa, Iris-setosa, Iris-versicolor, Iris-versicolor, Iris-versicolor, Iris-virginica, Iris-virginica, Iris-virginica) scala> res19.mkString("\n") res20: String = Iris-setosa Iris-setosa Iris-setosa Iris-versicolor Iris-versicolor Iris-versicolor Iris-virginica Iris-virginica Iris-virginica
Return two columns when mapping through a column list Spark SQL Scala
I want to programmatically give a certain number of fields and for some fields, select a column and pass that field to another function that will return a case class of string, string. So far I have val myList = Seq(("a", "b", "c", "d"), ("aa", "bb", "cc","dd")) val df = myList.toDF("col1","col2","col3","col4") val fields= "col1,col2" val myDF = df.select(df.columns.map(c => if (fields.contains(c)) { df.col(s"$c") && someUDFThatReturnsAStructTypeOfStringAndString(df.col(s"$c")).alias(s"${c}_processed") } else { df.col(s"$c") }): _*) Right now this is giving me the exception org.apache.spark.sql.AnalysisException: cannot resolve '(col1 AND UDF(col1))' due to data type mismatch: differing types in '(col1 AND UDF(col1))' (string and struct< STRING1:string,STRING2:string > ) I want to select col1 | < col1.String1, col1.String2 > | col2 | < col2.String1,col2.String2 > | col3 | col4 "a" | < "a1", "a2" > | "b" | < "b1", "b2" > | "c" | "d"
I ended up using the df.selectExpr and tying together a bunch of expressions. import spark.implicits._ val fields = "col1,col2".split(",") val exprToSelect = df.columns.filter(c => fields.contains(c)).map(c => s"someUDFThatReturnsAStructTypeOfStringAndString(${c}) as ${c}_parsed") ++ df.columns val exprToFilter = df.columns.filter(c => fields.contains(c)).map(c => s"length(${c}_parsed.String1) > 1").reduce(_ + " OR " + _) //error val exprToFilter2 = df.columns.filter(c => fields.contains(c)).map(c => s"(length(${c}_parsed.String1) < 1)").reduce(_ + " AND " + _) //valid val exprToSelectValid = df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String2 as ${c}") ++ df.columns.filterNot(c => fields.contains(c)) //valid val exprToSelectInValid = Array("concat(" + df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String1").mkString(", ") + ") as String1") ++ df.columns val parsedDF = df.select(exprToSelect.map { c => expr(s"$c")}: _ *) val validDF = parsedDF.filter(exprToFilter2) .select(exprToSelectValid.map { c => expr(s"$c")}: _ *) val errorDF = parsedDF.filter(exprToFilter) .select(exprToSelectInValid.map { c => expr(s"$c")}: _ *)
Scala FastParse Library Error
I am trying to learn the scala fast parse library. Towards this I have written the following code import fastparse.noApi._ import fastparse.WhitespaceApi object FastParsePOC { val White = WhitespaceApi.Wrapper{ import fastparse.all._ NoTrace(" ".rep) } def print(input : Parsed[String]): Unit = { input match { case Parsed.Success(value, index) => println(s"Success: $value $index") case f # Parsed.Failure(error, line, col) => println(s"Error: $error $line $col ${f.extra.traced.trace}") } } def main(args: Array[String]) : Unit = { import White._ val parser = P("Foo" ~ "(" ~ AnyChar.rep(1).! ~ ")") val input1 = "Foo(Bar(10), Baz(20))" print(parser.parse(input1)) } } But I get error Error: ")" 21 Extra(Foo(Bar(10), Baz(20)), [traced - not evaluated]) parser:1:1 / (AnyChar | ")"):1:21 ..."" My expected output was "Bar(10), Baz(20)". it seems the parser above does not like the ending ")".
AnyChar.rep(1) also includes ) symbol at the end of the input string, as a result the end ) at ~ ")") isn't reached. If ) symbol weren't used in Bar and Baz, then this could be solved by excluding ) from AnyChar like this: val parser = P("Foo" ~ "(" ~ (!")" ~ AnyChar).rep(1).! ~ ")") val input1 = "Foo(Bar(10*, Baz(20*)" To make Bar and Baz work with ) symbol you could define separate parsers for each of them (also excluding ) symbol from AnyChar. The following solution is a bit more flexible as it allows more occurrences of Bar and Baz but I hope that you get the idea. val bar = P("Bar" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")") val baz = P("Baz" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")") val parser = P("Foo" ~ "(" ~ (bar | baz).rep(sep = ",").! ~ ")") val input1 = "Foo(Bar(10), Baz(20))" print(parser.parse(input1)) Result: Success: Bar(10), Baz(20) 21
Stack overflow when using parser combinators
import scala.util.parsing.combinator._ object ExprParser extends JavaTokenParsers { lazy val name: Parser[_] = "a" ~ rep("a" | "1") | function_call lazy val function_call = name ~ "(" ~> name <~ ")" } recurs indefinitely for function_call.parseAll("aaa(1)"). Obviously, it is because 1 cannot inter the name and name enters the function_call, which tries the name, which enters the funciton call. How do you resolve such situations? There was a solution to reduce name to simple identifier def name = rep1("a" | "1") def function_call = name ~ "(" ~ (function_call | name) ~ ")" but I prefer not to do this because name ::= identifier | function_call is BNF-ed in VHDL specification and function_call is probably shared elsewhere. The left recursion elimination found here is undesirable for the same reason def name: Parser[_] = "a" ~ rep("a" | "1") ~ pared_name def pared_name: Parser[_] = "(" ~> name <~ ")" | "" BTW, I also wonder, if I fix the error, will name.parseAll consume "aaa" only as first alternative in the name rule or take whole "aaa(1)"? How can I make name to consume the whole aaa(1) before consuming only aaa? I guess that I should put function_call a first alternative in the name but it will stack overflow even more eagerly in this case?
An easy solution is use the packrat parser: object ExprParser extends JavaTokenParsers with PackratParsers { lazy val name: PackratParser[_] = "a" ~ rep("a" | "1") | function_call lazy val function_call: PackratParser[_] = name ~ "(" ~> name <~ ")" } Output: scala> ExprParser.parseAll(ExprParser.function_call, "aaa(1)") res0: ExprParser.ParseResult[Any] = [1.5] failure: Base Failure aaa(1) ^