How should I match a pattern in Scala? - scala

I need to do a pattern in Scala, this is a code:
object Wykonaj{
val doctype = DocType("html", PublicID("-//W3C//DTD XHTML 1.0 Strict//EN","http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"), Nil)
def main(args: Array[String]) {
val theUrl = "http://axv.pl/rss/waluty.php"
val xmlString = Source.fromURL(new URL(theUrl)).mkString
val xml = XML.loadString(xmlString)
val zawartosc= (xml \\ "description")
val pattern="""<descrition> </descrition>""".r
for(a <-zawartosc) yield a match{
case pattern=>println(pattern)
}
}
}
The problem is, I need to do val pattern=any pattern, to get from
<description><![CDATA[ <img src="http://youbookmarks.com/waluty/pic/waluty/AUD.gif"> dolar australijski 1AUD | 2,7778 | 210/A/NBP/2010 ]]> </description>
only it dolar australijski 1AUD | 2,7778 | 210/A/NBP/2010.

Try
import scala.util.matching.Regex
//...
val Pattern = new Regex(""".*; ([^<]*) </description>""")
//...
for(a <-zawartosc) yield a match {
case Pattern(p) => println(p)
}
It's a bit of a kludge (I don't use REs with Scala very often), but it seems to work. The CDATA is stringified as > entities, so the RE tries to find text after a semicolon and before a closing description tag.

val zawartosc = (xml \\ "description")
val pattern = """.*(dolar australijski.*)""".r
val allMatches = (for (a <- zawartosc; text = a.text) yield {text}) collect {
case pattern(value) => value }
val result = allMatches.headOption // or .head
This is mostly a matter of using the right regular expression. In this case you want to match the string that contains dolar australijski. It has to allow for extra characters before dolar. So use .*. Then use the parens to mark the start and end of what you need. Refer to the Java api for the full doc.
With respect to the for comprehension, I convert the XML element into text before doing the match and then collect the ones that match the pattern by using the collect method. Then the desired result should be the first and only element.

Related

Parse CSV and add only matching rows to List functionally in Scala

I am reading csv scala.
Person is a case class
Case class Person(name, address)
def getData(path:String,existingName) : List[Person] = {
Source.fromFile(“my_file_path”).getLines.drop(1).map(l => {
val data = l.split("|", -1).map(_.trim).toList
val personName = data(0)
if(personName.equalsIgnoreCase(existingName)) {
val address=data(1)
Person(personName,address)
//here I want to add to list
}
else
Nil
///here return empty list of zero length
}).toList()
}
I want to achieve this functionally in scala.
Here's the basic approach to what I think you're trying to do.
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :List[Person] = {
val recordPattern = raw"\s*(?i)($existingName)\s*\|\s*(.*)".r.unanchored
io.Source.fromFile(path).getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList
}
This doesn't close the file reader or report the error if the file can't be opened, which you really should do, but we'll leave that for a different day.
update: added file closing and error handling via Using (Scala 2.13)
import scala.util.{Using, Try}
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :Try[List[Person]] =
Using(io.Source.fromFile(path)){ file =>
val recordPattern = raw"\s*(?i)($existingName)\s*\|\s*([^|]*)".r
file.getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList
}
updated update
OK. Here's a version that:
reports the error if the file can't be opened
closes the file after it's been opened and read
ignores unwanted spaces and quote marks
is pre-2.13 compiler friendly
import scala.util.Try
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :List[Person] = {
val recordPattern =
raw"""[\s"]*(?i)($existingName)["\s]*\|[\s"]*([^"|]*)*.""".r
val file = Try(io.Source.fromFile(path))
val res = file.fold(
err => {println(err); List.empty[Person]},
_.getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList)
file.map(_.close)
res
}
And here's how the regex works:
[\s"]* there might be spaces or quote marks
(?i) matching is case-insensitive
($existingName) match and capture this string (1st capture group)
["\s]* there might be spaces or quote marks
\| there will be a bar character
[\s"]* there might be spaces or quote marks
([^"|]*) match and capture everything that isn't quote or bar
.* ignore anything that might come thereafter
you were not very clear on what was the problem on your approach, but this should do the trick (very close to what you have)
def getData(path:String, existingName: String) : List[Person] = {
val source = Source.fromFile("my_file_path")
val lst = source.getLines.drop(1).flatMap(l => {
val data = l.split("|", -1).map(_.trim).toList
val personName = data.head
if (personName.equalsIgnoreCase(existingName)) {
val address = data(1)
Option(Person(personName, address))
}
else
Option.empty
}).toList
source.close()
lst
}
we read the file line per line, for each line we extract the personName from the first csv field, and if it's the one we are looking for we return an (Option) Person, otherwise none (Option.empty). By doing flatmap we discard the empty options (just to avoid using nils)

Idiomatic way of extracting known key-> val pairs from a string

Example context:
An HTTP Response with a body as follows:
key1=val1&key2=val2&key3=val3.
The names of the keys are always known.
Currently the extraction is done with regex:
val params = response split ("""&""") map { _.split("""=""") } map { el => { el(0) -> el(1) } } toMap;
Is there a simpler way of pattern matching the response for specific params?
I think using split is probably going to be the fastest/simplest solution here. You're not doing any advanced parsing, so using parser combinators or regex capture groups seems a little overkill.
However, when you have complex expressions involving multiple calls to map, filter, etc., it's usually an indicator that you can clean things up with a for-comprehension:
val response = "key1=val1&key2=val2&key3=val3"
val params = (for { x <- response split ("&")
Array(k, v) = x split ("=") }
yield k->v).toMap
You can use parser combinators here for most flexibility and robustness (i.e., handle failed parsing):
object Parser extends RegexParsers with App {
def lit: Parser[String] = "[^=&]+".r
def pair: Parser[(String, String)] = lit ~ "=" ~ lit ^^ {
case key ~ "=" ~ value => key -> value
}
def parse: Parser[Seq[(String, String)]] = repsep(pair, "&")
val response = "key1=val1&key2=val2&key3=val3"
val params = parse(new CharSequenceReader(response)).get.toMap
println(params)
}
You can use regexp as a matcher like this:
val r = "([^=]+)=([^=]+)".r
def toKv(s:String) = s match {
case r(k,v) => (k,v)
case _ => throw InvalidFormatException
}
So, for your case it would look like:
response split ("&") map (toKv)

Internal DSL in Scala: Lists without ","

I'm trying to build an internal DSL in Scala to represent algebraic definitions. Let's consider this simplified data model:
case class Var(name:String)
case class Eq(head:Var, body:Var*)
case class Definition(name:String, body:Eq*)
For example a simple definition would be:
val x = Var("x")
val y = Var("y")
val z = Var("z")
val eq1 = Eq(x, y, z)
val eq2 = Eq(y, x, z)
val defn = Definition("Dummy", eq1, eq2)
I would like to have an internal DSL to represent such an equation in the form:
Dummy {
x = y z
y = x z
}
The closest I could get is the following:
Definition("Dummy") := (
"x" -> ("y", "z")
"y" -> ("x", "z")
)
The first problem I encountered is that I cannot have two implicit conversions for Definition and Var, hence Definition("Dummy"). The main problem, however, are the lists. I don't want to surround them by any thing, e.g. (), and I also don't want their elements be separated by commas.
Is what I want possible using Scala? If yes, can anyone show me an easy way of achieving it?
While Scalas syntax is powerful, it is not flexible enough to create arbitrary delimiters for symbols. Thus, there is no way to leave commas and replace them only with spaces.
Nevertheless, it is possible to use macros and parse a string with arbitrary content at compile time. It is not an "easy" solution, but one that works:
object AlgDefDSL {
import language.experimental.macros
import scala.reflect.macros.Context
implicit class DefDSL(sc: StringContext) {
def dsl(): Definition = macro __dsl_impl
}
def __dsl_impl(c: Context)(): c.Expr[Definition] = {
import c.universe._
val defn = c.prefix.tree match {
case Apply(_, List(Apply(_, List(Literal(Constant(s: String)))))) =>
def toAST[A : TypeTag](xs: Tree*): Tree =
Apply(
Select(Ident(typeOf[A].typeSymbol.companionSymbol), newTermName("apply")),
xs.toList
)
def toVarAST(varObj: Var) =
toAST[Var](c.literal(varObj.name).tree)
def toEqAST(eqObj: Eq) =
toAST[Eq]((eqObj.head +: eqObj.body).map(toVarAST(_)): _*)
def toDefAST(defObj: Definition) =
toAST[Definition](c.literal(defObj.name).tree +: defObj.body.map(toEqAST(_)): _*)
parsers.parse(s) match {
case parsers.Success(defn, _) => toDefAST(defn)
case parsers.NoSuccess(msg, _) => c.abort(c.enclosingPosition, msg)
}
}
c.Expr(defn)
}
import scala.util.parsing.combinator.JavaTokenParsers
private object parsers extends JavaTokenParsers {
override val whiteSpace = "[ \t]*".r
lazy val newlines =
opt(rep("\n"))
lazy val varP =
"[a-z]+".r ^^ Var
lazy val eqP =
(varP <~ "=") ~ rep(varP) ^^ {
case lhs ~ rhs => Eq(lhs, rhs: _*)
}
lazy val defHead =
newlines ~> ("[a-zA-Z]+".r <~ "{") <~ newlines
lazy val defBody =
rep(eqP <~ rep("\n"))
lazy val defEnd =
"}" ~ newlines
lazy val defP =
defHead ~ defBody <~ defEnd ^^ {
case name ~ eqs => Definition(name, eqs: _*)
}
def parse(s: String) = parseAll(defP, s)
}
case class Var(name: String)
case class Eq(head: Var, body: Var*)
case class Definition(name: String, body: Eq*)
}
It can be used with something like this:
scala> import AlgDefDSL._
import AlgDefDSL._
scala> dsl"""
| Dummy {
| x = y z
| y = x z
| }
| """
res12: AlgDefDSL.Definition = Definition(Dummy,WrappedArray(Eq(Var(x),WrappedArray(Var(y), Var(z))), Eq(Var(y),WrappedArray(Var(x), Var(z)))))
In addition to sschaef's nice solution I want to mention a few possibilities that are commonly used to get rid of commas in list construction for a DSL.
Colons
This might be trivial, but it is sometimes overlooked as a solution.
line1 ::
line2 ::
line3 ::
Nil
For a DSL it is often desired that every line that contains some instruction/data is terminated the same way (opposed to Lists where all but the last line will get a comma). With such a solutions exchanging the lines no longer can mess up the trailing comma. Unfortunately, the Nil looks a bit ugly.
Fluid API
Another alternative that might be interesting for a DSL is something like that:
BuildDefinition()
.line1
.line2
.line3
.build
where each line is a member function of the builder (and returns a modified builder). This solution requires to eventually convert the builder to a list (which might be done as an implicit conversion). Note that for some APIs it might be possible to pass around the builder instances themselves, and only extract the data wherever needed.
Constructor API
Similarly another possibility is to exploit constructors.
new BuildInterface {
line1
line2
line3
}
Here, BuildInterface is a trait and we simply instantiate an anonymous class from the interface. The line functions call some member functions of this trait. Each invocation can internally update the state of the build interface. Note that this commonly results in a mutable design (but only during construction). To extract the list, an implicit conversion could be used.
Since I don't understand the actual purpose of your DSL, I'm not really sure if any of these techniques is interesting for your scenario. I just wanted to add them since they are common ways to get rid of ",".
Here is another solution which is relatively simple and enables a syntax that is pretty close to your ideal
(as other have pointed, the exact syntax your asked for is not possible, in particular because you cannot redefine delimiter symbols).
My solution stretches a bit what is reasonable to do because it adds an operator right on scala.Symbol,
but if you're going to use this DSL in a constrained scope then this should be OK.
object VarOps {
val currentEqs = new util.DynamicVariable( Vector.empty[Eq] )
}
implicit class VarOps( val variable: Var ) extends AnyVal {
import VarOps._
def :=[T]( body: Var* ) = {
val eq = Eq( variable, body:_* )
currentEqs.value = currentEqs.value :+ eq
}
}
implicit class SymbolOps( val sym: Symbol ) extends AnyVal {
def apply[T]( body: => Unit ): Definition = {
import VarOps._
currentEqs.withValue( Vector.empty[Eq] ) {
body
Definition( sym.name, currentEqs.value:_* )
}
}
}
Now you can do:
'Dummy {
x := (y, z)
y := (x, z)
}
Which builds the following definition (as printed in the REPL):
Definition(Dummy,Vector(Eq(Var(x),WrappedArray(Var(y), Var(z))), Eq(Var(y),WrappedArray(Var(x), Var(z)))))

Scala Newb Question - about scoping and variables

I'm parsing XML, and keep finding myself writing code like:
val xml = <outertag>
<dog>val1</dog>
<cat>val2</cat>
</outertag>
var cat = ""
var dog = ""
for (inner <- xml \ "_") {
inner match {
case <dog>{ dg # _* }</dog> => dog = dg(0).toString()
case <cat>{ ct # _* }</cat> => cat = ct(0).toString()
}
}
/* do something with dog and cat */
It annoys me because I should be able to declare cat and dog as val (immutable), since I only need to set them once, but I have to make them mutable. And besides that it just seems like there must be a better way to do this in scala. Any ideas?
Here are two (now make it three) possible solutions. The first one is pretty quick and dirty. You can run the whole bit in the Scala interpreter.
val xmlData = <outertag>
<dog>val1</dog>
<cat>val2</cat>
</outertag>
// A very simple way to do this mapping.
def simpleGetNodeValue(x:scala.xml.NodeSeq, tag:String) = (x \\ tag).text
val cat = simpleGetNodeValue(xmlData, "cat")
val dog = simpleGetNodeValue(xmlData, "dog")
cat will be "val2", and dog will be "val1".
Note that if either node is not found, an empty string will be returned. You can work around this, or you could write it in a slightly more idiomatic way:
// A more idiomatic Scala way, even though Scala wouldn't give us nulls.
// This returns an Option[String].
def getNodeValue(x:scala.xml.NodeSeq, tag:String) = {
(x \\ tag).text match {
case "" => None
case x:String => Some(x)
}
}
val cat1 = getNodeValue(xmlData, "cat") getOrElse "No cat found."
val dog1 = getNodeValue(xmlData, "dog") getOrElse "No dog found."
val goat = getNodeValue(xmlData, "goat") getOrElse "No goat found."
cat1 will be "val2", dog1 will be "val1", and goat will be "No goat found."
UPDATE: Here's one more convenience method to take a list of tag names and return their matches as a Map[String, String].
// Searches for all tags in the List and returns a Map[String, String].
def getNodeValues(x:scala.xml.NodeSeq, tags:List[String]) = {
tags.foldLeft(Map[String, String]()) { (a, b) => a(b) = simpleGetNodeValue(x, b)}
}
val tagsToMatch = List("dog", "cat")
val matchedValues = getNodeValues(xmlData, tagsToMatch)
If you run that, matchedValues will be Map(dog -> val1, cat -> val2).
Hope that helps!
UPDATE 2: Per Daniel's suggestion, I'm using the double-backslash operator, which will descend into child elements, which may be better as your XML dataset evolves.
scala> val xml = <outertag><dog>val1</dog><cat>val2</cat></outertag>
xml: scala.xml.Elem = <outertag><dog>val1</dog><cat>val2</cat></outertag>
scala> val cat = xml \\ "cat" text
cat: String = val2
scala> val dog = xml \\ "dog" text
dog: String = val1
Consider wrapping up the XML inspection and pattern matching in a function that returns the multiple values you need as a tuple (Tuple2[String, String]). But stop and consider: it looks like it's possible to not match any dog and cat elements, which would leave you returning null for one or both of the tuple components. Perhaps you could return a tuple of Option[String], or throw if either of the element patterns fail to bind.
In any case, you can generally solve these initialization problems by wrapping up the constituent statements into a function to yield an expression. Once you have an expression in hand, you can initialize a constant with the result of its evaluation.

Accessing Scala Parser regular expression match data

I wondering if it's possible to get the MatchData generated from the matching regular expression in the grammar below.
object DateParser extends JavaTokenParsers {
....
val dateLiteral = """(\d{4}[-/])?(\d\d[-/])?(\d\d)""".r ^^ {
... get MatchData
}
}
One option of course is to perform the match again inside the block, but since the RegexParser has already performed the match I'm hoping that it passes the MatchData to the block, or stores it?
Here is the implicit definition that converts your Regex into a Parser:
/** A parser that matches a regex string */
implicit def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
(r findPrefixMatchOf (source.subSequence(start, source.length))) match {
case Some(matched) =>
Success(source.subSequence(start, start + matched.end).toString,
in.drop(start + matched.end - offset))
case None =>
Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
}
}
}
Just adapt it:
object X extends RegexParsers {
/** A parser that matches a regex string and returns the Match */
def regexMatch(r: Regex): Parser[Regex.Match] = new Parser[Regex.Match] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
(r findPrefixMatchOf (source.subSequence(start, source.length))) match {
case Some(matched) =>
Success(matched,
in.drop(start + matched.end - offset))
case None =>
Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
}
}
}
val t = regexMatch("""(\d\d)/(\d\d)/(\d\d\d\d)""".r) ^^ { case m => (m.group(1), m.group(2), m.group(3)) }
}
Example:
scala> X.parseAll(X.t, "23/03/1971")
res8: X.ParseResult[(String, String, String)] = [1.11] parsed: (23,03,1971)
No, you can't do this. If you look at the definition of the Parser used when you convert a regex to a Parser, it throws away all context and just returns the full matched string:
http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_7_7_final/src/library/scala/util/parsing/combinator/RegexParsers.scala?view=markup#L55
You have a couple of other options, though:
break up your parser into several smaller parsers (for the tokens you actually want to extract)
define a custom parser that extracts the values you want and returns a domain object instead of a string
The first would look like
val separator = "-" | "/"
val year = ("""\d{4}"""r) <~ separator
val month = ("""\d\d"""r) <~ separator
val day = """\d\d"""r
val date = ((year?) ~ (month?) ~ day) map {
case year ~ month ~ day =>
(year.getOrElse("2009"), month.getOrElse("11"), day)
}
The <~ means "require these two tokens together, but only give me the result of the first one.
The ~ means "require these two tokens together and tie them together in a pattern-matchable ~ object.
The ? means that the parser is optional and will return an Option.
The .getOrElse bit provides a default value for when the parser didn't define a value.
When a Regex is used in a RegexParsers instance, the implicit def regex(Regex): Parser[String] in RegexParsers is used to appoly that Regex to the input. The Match instance yielded upon successful application of the RE at the current input is used to construct a Success in the regex() method, but only its "end" value is used, so any captured sub-matches are discarded by the time that method returns.
As it stands (in the 2.7 source I looked at), you're out of luck, I believe.
I ran into a similar issue using scala 2.8.1 and trying to parse input of the form "name:value" using the RegexParsers class:
package scalucene.query
import scala.util.matching.Regex
import scala.util.parsing.combinator._
object QueryParser extends RegexParsers {
override def skipWhitespace = false
private def quoted = regex(new Regex("\"[^\"]+"))
private def colon = regex(new Regex(":"))
private def word = regex(new Regex("\\w+"))
private def fielded = (regex(new Regex("[^:]+")) <~ colon) ~ word
private def term = (fielded | word | quoted)
def parseItem(str: String) = parse(term, str)
}
It seems that you can grab the matched groups after parsing like this:
QueryParser.parseItem("nameExample:valueExample") match {
case QueryParser.Success(result:scala.util.parsing.combinator.Parsers$$tilde, _) => {
println("Name: " + result.productElement(0) + " value: " + result.productElement(1))
}
}