Scala parser combinator for Logo list? - scala

I am trying to make a token based scala parser for UCB Logo. The problem I am facing is that in Logo any expression that lists in UCB Logo values in a list can be delimited by one of ']', '[', ' '. If there are any other kinds of delimiters the content in the list should be treated as a word.
In short, how can I make a token parser that will consider the following:
[ 4 3 2 ] - should be a list
[ [ 4 3 2 ] ] - should be a list within a list
[ 1 + 2 ] - should be a word inside a list
[ [ 1 2 3 ] + ] - should be a word inside a list
The following
'[' ~ rep(chrExcept('[', ']')) ~ ']'
produces these tokens:
Tokens: List([, [1 2 3], +, ])
from [ [ 1 2 3 ] + ]. I believe it should produce the tokens:
List([, [1 2 3] +, ]) -> merge the + sign with the token [1 2 3].
This is the current code of the Lexical I am using:
package lexical
import scala.language.postfixOps
import scala.util.parsing.combinator.lexical.Lexical
import scala.util.parsing.input.CharSequenceReader._
/**
* Created by Marin on 28/03/16.
*/
class MyLexical extends Lexical with MyTokens {
def token: Parser[Token] = (
//procDef ^^ { case first ~ chars => processNewProcedure(chars mkString "") }
word2 ^^ { case rest => {
/*val s = if (second.isEmpty) "" else second mkString ""
val t = if(third.isEmpty) "" else third mkString ""
val f = if(fourth.isEmpty) "" else fourth mkString ""
StringLit(s"$first$s$t$f$rest")*/
println(rest)
StringLit("Smth")
}
}
| formalChar ~ rep(identChar | digit) ^^ { case first ~ rest => Formal(first :: rest mkString "") }
| identChar ~ rep(identChar | digit) ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
| procDigit ^^ { case first ~ second ~ rest => NumericLit((first mkString "") :: second.getOrElse("") :: rest mkString "") }
| '\"' ~ rep(chrExcept('\"', EofCh)) ~ ' ' ^^ { case '\"' ~ chars ~ ' ' => StringLit(chars mkString "") }
| EofCh ^^^ EOF
| delim
| failure("Illegal character")
)
def processNewProcedure(chars: String) =
if(reserved.contains(chars)) throw new RuntimeException
else {
Identifier(chars)
}
def procDef = toSeq ~> identChar ~ rep(identChar | elem('_')) <~ formalChar.* <~ endSeq
def toSeq = 't' ~ 'o' ^^^ "to"
def endSeq = 'e' ~ 'n' ~ 'd' ^^^ "end"
def processIdent(name: String) = {
if (reserved contains name) {
Keyword(name)
} else {
Identifier(name)
}
}
def word = {
'[' ~ ((whitespaceChar | digit)*) ~ (_delim | identChar) ~ rep(whitespaceChar | digit) ~ ']'
}
def word2 = {
//'[' ~> rep(whitespaceChar | digit) ~> rep(_delim | identChar) <~ rep(whitespaceChar | digit) <~ ']'
//'[' ~ rep(chrExcept('[', ']')) ~ ']'
rep1('[') ~ rep1(chrExcept('[', ']') | digit) ~ rep(_delim) ~ rep1(']')
//rep1('[') ~ identChar ~ rep(']') ~ rep('+') ~ rep1(']')
//'[' ~ (_delim | chrExcept('[', ']')) ~ ']'
}
def word3 = {
'[' ~> rep(digit | letter | _delim) <~ ']'
}
def procDigit = digit.+ ~ '.'.? ~ digit.*
def identChar = letter | elem('_')
def formalChar = ':' ~ identChar
override def whitespace: Parser[Any] = rep[Any] (
whitespaceChar
| ';' ~ comment
)
def comment: Parser[Any] = rep(chrExcept(EofCh, ';')) ^^ { case _ => ' ' }
/****** Pure copy-paste ******/
/** The set of reserved identifiers: these will be returned as `Keyword`s. */
val reserved = new scala.collection.mutable.HashSet[String]
/** The set of delimiters (ordering does not matter). */
val delimiters = new scala.collection.mutable.HashSet[String]
private lazy val _delim: Parser[Token] = {
// construct parser for delimiters by |'ing together the parsers for the individual delimiters,
// starting with the longest one -- otherwise a delimiter D will never be matched if there is
// another delimiter that is a prefix of D
def parseDelim(s: String): Parser[Token] = accept(s.toList) ^^ { x => Keyword(s) }
val d = new Array[String](delimiters.size)
delimiters.copyToArray(d, 0)
scala.util.Sorting.quickSort(d)
(d.toList map parseDelim).foldRight(failure("no matching delimiter"): Parser[Token])((x, y) => y | x)
}
protected def delim: Parser[Token] = _delim
}

Related

scala replace function with camma as ","

The below Input i have to replace last comma (,) with "," between two colons(:)
println(input)
//[level:1,File:one,three,Flag:NA][level:1,File:two,Flag:NA]
println(input.replace(",", "\",\""))
getting result as:
//[level:1","File:one","three","Flag:NA][level:1","File:two","Flag:NA]
expected result should be
[level:1","File:one,three","Flag:NA][level:1","File:two","Flag:NA]
Kindly help me.
val str1 = "[level:1,File:one,three,Flag:NA][level:1,File:two,Flag:NA]"
val regex1 = raw"(,)(\w+:)".r
val matches = regex1.findAllMatchIn(str1)
val str2 = matches.foldLeft(str1)({ case (str, m) =>
str.replaceFirst(m.group(0), "\",\"" + m.group(2))
})
// str2: String = [level:1","File:one,three","Flag:NA][level:1","File:two","Flag:NA]

Scala: How to apply the pattern matching function to a csv file which contains the below mentioned values

def matchcase(x:String):Int = x match{
case "Iris-setosa" => 10
case "Iris-virginica" => 20
case "Iris-versicolor"=> 30
case _ => 0
The sample data in the csv file is as below:
1,5.1,3.5,1.4,0.2,Iris-setosa
2,4.9,3,1.4,0.2,Iris-setosa
3,4.7,3.2,1.3,0.2,Iris-setosa
51,7,3.2,4.7,1.4,Iris-versicolor
52,6.4,3.2,4.5,1.5,Iris-versicolor
53,6.9,3.1,4.9,1.5,Iris-versicolor
103,7.1,3,5.9,2.1,Iris-virginica
104,6.3,2.9,5.6,1.8,Iris-virginica
105,6.5,3,5.8,2.2,Iris-virginica
In your comment you've mentioned
Actually i want to replace the text in the whole file by using the matchcase function
And assuming that you have data and function as mentioned in the question, you can use scala.io.Source.fromFile to read the file
val data = scala.io.Source.fromFile("input file path")
call the matchcase function you have written
val replacedData = data.getLines().map(_.split(",")).map(array => array.init.mkString(",")+","+matchcase(array.last))
and finally write the output
new PrintWriter("path to output file"){write(replacedData.mkString("\n")); close}
You should have a file with following data
1,5.1,3.5,1.4,0.2,10
2,4.9,3,1.4,0.2,10
3,4.7,3.2,1.3,0.2,10
51,7,3.2,4.7,1.4,30
52,6.4,3.2,4.5,1.5,30
53,6.9,3.1,4.9,1.5,30
103,7.1,3,5.9,2.1,20
104,6.3,2.9,5.6,1.8,20
105,6.5,3,5.8,2.2,20
I hope the answer is helpful
Use extractor pattern in Scala with regex
Given regex
val regex = "[[0-9]|,|\\.]+([[a-zA-z]|\\-]+)".r
Now Just pattern match the whole text line by line
lines.map {
case regex(name) => Some(name)
case _ => None
}
later use your matchCase function to convert strings (names) into numbers
and replace the name with number
Code you looking for
lines.map { line =>
line match {
case regex(name) => line.replace("[[a-zA-z]|\\-]+", matchcase(name).toString)
case _ => line
}
}
Scala REPL
scala> val regex = "[[0-9]|,|\\.]+([[a-zA-z]|\\-]+)".r
regex: scala.util.matching.Regex = [[0-9]|,|\.]+([[a-zA-z]|\-]+)
scala> "1,5.1,3.5,1.4,0.2,Iris-setosa" match { case regex(str) => println(s"name: $str")}
name: Iris-setosa
Next
Processing whole text
scala> val text = """
| 1,5.1,3.5,1.4,0.2,Iris-setosa
| 2,4.9,3,1.4,0.2,Iris-setosa
| 3,4.7,3.2,1.3,0.2,Iris-setosa
| 51,7,3.2,4.7,1.4,Iris-versicolor
| 52,6.4,3.2,4.5,1.5,Iris-versicolor
| 53,6.9,3.1,4.9,1.5,Iris-versicolor
| 103,7.1,3,5.9,2.1,Iris-virginica
| 104,6.3,2.9,5.6,1.8,Iris-virginica
| 105,6.5,3,5.8,2.2,Iris-virginica
| """.stripMargin
text: String =
"
1,5.1,3.5,1.4,0.2,Iris-setosa
2,4.9,3,1.4,0.2,Iris-setosa
3,4.7,3.2,1.3,0.2,Iris-setosa
51,7,3.2,4.7,1.4,Iris-versicolor
52,6.4,3.2,4.5,1.5,Iris-versicolor
53,6.9,3.1,4.9,1.5,Iris-versicolor
103,7.1,3,5.9,2.1,Iris-virginica
104,6.3,2.9,5.6,1.8,Iris-virginica
105,6.5,3,5.8,2.2,Iris-virginica
"
scala> val lines = text.split("\n").filter(_.trim.nonEmpty)
lines: Array[String] = Array(1,5.1,3.5,1.4,0.2,Iris-setosa, 2,4.9,3,1.4,0.2,Iris-setosa, 3,4.7,3.2,1.3,0.2,Iris-setosa, 51,7,3.2,4.7,1.4,Iris-versicolor, 52,6.4,3.2,4.5,1.5,Iris-versicolor, 53,6.9,3.1,4.9,1.5,Iris-versicolor, 103,7.1,3,5.9,2.1,Iris-virginica, 104,6.3,2.9,5.6,1.8,Iris-virginica, 105,6.5,3,5.8,2.2,Iris-virginica)
scala> lines.map {
| case regex(name) => Some(name)
| case _ => None
| }
res18: Array[Option[String]] = Array(Some(Iris-setosa), Some(Iris-setosa), Some(Iris-setosa), Some(Iris-versicolor), Some(Iris-versicolor), Some(Iris-versicolor), Some(Iris-virginica), Some(Iris-virginica), Some(Iris-virginica))
Now use collect and only collect values which are Some
scala> res18.collect { case Some(value) => value }
res19: Array[String] = Array(Iris-setosa, Iris-setosa, Iris-setosa, Iris-versicolor, Iris-versicolor, Iris-versicolor, Iris-virginica, Iris-virginica, Iris-virginica)
scala> res19.mkString("\n")
res20: String =
Iris-setosa
Iris-setosa
Iris-setosa
Iris-versicolor
Iris-versicolor
Iris-versicolor
Iris-virginica
Iris-virginica
Iris-virginica

Return two columns when mapping through a column list Spark SQL Scala

I want to programmatically give a certain number of fields and for some fields, select a column and pass that field to another function that will return a case class of string, string. So far I have
val myList = Seq(("a", "b", "c", "d"), ("aa", "bb", "cc","dd"))
val df = myList.toDF("col1","col2","col3","col4")
val fields= "col1,col2"
val myDF = df.select(df.columns.map(c => if (fields.contains(c)) { df.col(s"$c") && someUDFThatReturnsAStructTypeOfStringAndString(df.col(s"$c")).alias(s"${c}_processed") } else { df.col(s"$c") }): _*)
Right now this is giving me the exception
org.apache.spark.sql.AnalysisException: cannot resolve '(col1 AND UDF(col1))' due to data type mismatch: differing types in '(col1 AND UDF(col1))' (string and struct< STRING1:string,STRING2:string > )
I want to select
col1 | < col1.String1, col1.String2 > | col2 | < col2.String1,col2.String2 > | col3 | col4
"a" | < "a1", "a2" > | "b" | < "b1", "b2" > | "c" | "d"
I ended up using the df.selectExpr and tying together a bunch of expressions.
import spark.implicits._
val fields = "col1,col2".split(",")
val exprToSelect = df.columns.filter(c => fields.contains(c)).map(c => s"someUDFThatReturnsAStructTypeOfStringAndString(${c}) as ${c}_parsed") ++ df.columns
val exprToFilter = df.columns.filter(c => fields.contains(c)).map(c => s"length(${c}_parsed.String1) > 1").reduce(_ + " OR " + _) //error
val exprToFilter2 = df.columns.filter(c => fields.contains(c)).map(c => s"(length(${c}_parsed.String1) < 1)").reduce(_ + " AND " + _) //valid
val exprToSelectValid = df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String2 as ${c}") ++ df.columns.filterNot(c => fields.contains(c)) //valid
val exprToSelectInValid = Array("concat(" + df.columns.filter(c => fields.contains(c)).map(c => s"${c}_parsed.String1").mkString(", ") + ") as String1") ++ df.columns
val parsedDF = df.select(exprToSelect.map { c => expr(s"$c")}: _ *)
val validDF = parsedDF.filter(exprToFilter2)
.select(exprToSelectValid.map { c => expr(s"$c")}: _ *)
val errorDF = parsedDF.filter(exprToFilter)
.select(exprToSelectInValid.map { c => expr(s"$c")}: _ *)

Scala FastParse Library Error

I am trying to learn the scala fast parse library. Towards this I have written the following code
import fastparse.noApi._
import fastparse.WhitespaceApi
object FastParsePOC {
val White = WhitespaceApi.Wrapper{
import fastparse.all._
NoTrace(" ".rep)
}
def print(input : Parsed[String]): Unit = {
input match {
case Parsed.Success(value, index) => println(s"Success: $value $index")
case f # Parsed.Failure(error, line, col) => println(s"Error: $error $line $col ${f.extra.traced.trace}")
}
}
def main(args: Array[String]) : Unit = {
import White._
val parser = P("Foo" ~ "(" ~ AnyChar.rep(1).! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
}
}
But I get error
Error: ")" 21 Extra(Foo(Bar(10), Baz(20)), [traced - not evaluated]) parser:1:1 / (AnyChar | ")"):1:21 ...""
My expected output was "Bar(10), Baz(20)". it seems the parser above does not like the ending ")".
AnyChar.rep(1) also includes ) symbol at the end of the input string, as a result the end ) at ~ ")") isn't reached.
If ) symbol weren't used in Bar and Baz, then this could be solved by excluding ) from AnyChar like this:
val parser = P("Foo" ~ "(" ~ (!")" ~ AnyChar).rep(1).! ~ ")")
val input1 = "Foo(Bar(10*, Baz(20*)"
To make Bar and Baz work with ) symbol you could define separate parsers for each of them (also excluding ) symbol from AnyChar. The following solution is a bit more flexible as it allows more occurrences of Bar and Baz but I hope that you get the idea.
val bar = P("Bar" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val baz = P("Baz" ~ "(" ~ (!")" ~ AnyChar).rep(1) ~ ")")
val parser = P("Foo" ~ "(" ~ (bar | baz).rep(sep = ",").! ~ ")")
val input1 = "Foo(Bar(10), Baz(20))"
print(parser.parse(input1))
Result:
Success: Bar(10), Baz(20) 21

Stack overflow when using parser combinators

import scala.util.parsing.combinator._
object ExprParser extends JavaTokenParsers {
lazy val name: Parser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call = name ~ "(" ~> name <~ ")"
}
recurs indefinitely for function_call.parseAll("aaa(1)"). Obviously, it is because 1 cannot inter the name and name enters the function_call, which tries the name, which enters the funciton call. How do you resolve such situations?
There was a solution to reduce name to simple identifier
def name = rep1("a" | "1")
def function_call = name ~ "(" ~ (function_call | name) ~ ")"
but I prefer not to do this because name ::= identifier | function_call is BNF-ed in VHDL specification and function_call is probably shared elsewhere. The left recursion elimination found here is undesirable for the same reason
def name: Parser[_] = "a" ~ rep("a" | "1") ~ pared_name
def pared_name: Parser[_] = "(" ~> name <~ ")" | ""
BTW, I also wonder, if I fix the error, will name.parseAll consume "aaa" only as first alternative in the name rule or take whole "aaa(1)"? How can I make name to consume the whole aaa(1) before consuming only aaa? I guess that I should put function_call a first alternative in the name but it will stack overflow even more eagerly in this case?
An easy solution is use the packrat parser:
object ExprParser extends JavaTokenParsers with PackratParsers {
lazy val name: PackratParser[_] = "a" ~ rep("a" | "1") | function_call
lazy val function_call: PackratParser[_] = name ~ "(" ~> name <~ ")"
}
Output:
scala> ExprParser.parseAll(ExprParser.function_call, "aaa(1)")
res0: ExprParser.ParseResult[Any] =
[1.5] failure: Base Failure
aaa(1)
^