How to distinguish triple quotes from single quotes in macros? - scala

I am writing a macro m(expr: String), where expr is an expression in some language (not Scala):
m("SOME EXPRESSION")
m("""
SOME EXPRESSION
""")
When I am parsing the expression I would like to report error messages with proper locations in the source file. To achieve this I should know the location of the string literal itself and the number of quotes of the literal (3 or 1). Unfortunately, I did not find any method that returns the number of quotes of the literal:
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
object Temp {
def m(s: String) : String = macro mImpl
def mImpl(context: Context)(s: context.Expr[String]): context.universe.Tree = {
import context.universe._
s match {
case l # Literal(Constant(p: String)) =>
if (l.<TRIPLE QUOTES>) {
...
} else {
...
}
case _ =>
context.abort(context.enclosingPosition, "The argument of m must be a string literal")
}
}
}
What should I put instead of <TRIPLE QUOTES>?

The only way i can think of is to access the source file and check for triple quotes:
l.tree.pos.source.content.startsWith("\"\"\"",l.tree.pos.start)
You need also to edit your matching case:
case l # Expr(Literal(Constant(p: String))) =>
Here the version with some explanation:
val tree: context.universe.Tree = l.tree
val pos: scala.reflect.api.Position = tree.pos
val source: scala.reflect.internal.util.SourceFile = pos.source
val content: Array[Char] = source.content
val start: Int = pos.start
val isTriple: Boolean = content.startsWith("\"\"\"",start)

Related

Parse CSV and add only matching rows to List functionally in Scala

I am reading csv scala.
Person is a case class
Case class Person(name, address)
def getData(path:String,existingName) : List[Person] = {
Source.fromFile(“my_file_path”).getLines.drop(1).map(l => {
val data = l.split("|", -1).map(_.trim).toList
val personName = data(0)
if(personName.equalsIgnoreCase(existingName)) {
val address=data(1)
Person(personName,address)
//here I want to add to list
}
else
Nil
///here return empty list of zero length
}).toList()
}
I want to achieve this functionally in scala.
Here's the basic approach to what I think you're trying to do.
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :List[Person] = {
val recordPattern = raw"\s*(?i)($existingName)\s*\|\s*(.*)".r.unanchored
io.Source.fromFile(path).getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList
}
This doesn't close the file reader or report the error if the file can't be opened, which you really should do, but we'll leave that for a different day.
update: added file closing and error handling via Using (Scala 2.13)
import scala.util.{Using, Try}
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :Try[List[Person]] =
Using(io.Source.fromFile(path)){ file =>
val recordPattern = raw"\s*(?i)($existingName)\s*\|\s*([^|]*)".r
file.getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList
}
updated update
OK. Here's a version that:
reports the error if the file can't be opened
closes the file after it's been opened and read
ignores unwanted spaces and quote marks
is pre-2.13 compiler friendly
import scala.util.Try
case class Person(name:String, address:String)
def getData(path:String, existingName:String) :List[Person] = {
val recordPattern =
raw"""[\s"]*(?i)($existingName)["\s]*\|[\s"]*([^"|]*)*.""".r
val file = Try(io.Source.fromFile(path))
val res = file.fold(
err => {println(err); List.empty[Person]},
_.getLines.drop(1).collect {
case recordPattern(name,addr) => Person(name, addr.trim)
}.toList)
file.map(_.close)
res
}
And here's how the regex works:
[\s"]* there might be spaces or quote marks
(?i) matching is case-insensitive
($existingName) match and capture this string (1st capture group)
["\s]* there might be spaces or quote marks
\| there will be a bar character
[\s"]* there might be spaces or quote marks
([^"|]*) match and capture everything that isn't quote or bar
.* ignore anything that might come thereafter
you were not very clear on what was the problem on your approach, but this should do the trick (very close to what you have)
def getData(path:String, existingName: String) : List[Person] = {
val source = Source.fromFile("my_file_path")
val lst = source.getLines.drop(1).flatMap(l => {
val data = l.split("|", -1).map(_.trim).toList
val personName = data.head
if (personName.equalsIgnoreCase(existingName)) {
val address = data(1)
Option(Person(personName, address))
}
else
Option.empty
}).toList
source.close()
lst
}
we read the file line per line, for each line we extract the personName from the first csv field, and if it's the one we are looking for we return an (Option) Person, otherwise none (Option.empty). By doing flatmap we discard the empty options (just to avoid using nils)

Addition of numbers recursively in Scala

In this Scala code I'm trying to analyze a string that contains a sum (such as 12+3+5) and return the result (20). I'm using regex to extract the first digit and parse the trail to be added recursively. My issue is that since the regex returns a String, I cannot add up the numbers. Any ideas?
object TestRecursive extends App {
val plus = """(\w*)\+(\w*)""".r
println(parse("12+3+5"))
def parse(str: String) : String = str match {
// sum
case plus(head, trail) => parse(head) + parse(trail)
case _ => str
}
}
You might want to use the parser combinators for an application like this.
"""(\w*)\+(\w*)""".r also matches "+" or "23+" or "4 +5" // but captures it only in the first group.
what you could do might be
scala> val numbers = "[+-]?\\d+"
numbers: String = [+-]?\d+
^
scala> numbers.r.findAllIn("1+2-3+42").map(_.toInt).reduce(_ + _)
res4: Int = 42
scala> numbers.r.findAllIn("12+3+5").map(_.toInt).reduce(_ + _)
res5: Int = 20

How to return a value from a Scala def

I am new to Scala, but have some experience with Haskell. I did the following:
import scala.io.Source
val fileContent = Source.fromFile(filename).getLines.toList
val content = fileContent.map(processLine)
def processLine(line: String){
val words = line.split("\\s+")
println((words(0), words(1)))
}
Here processLine doesn't return anything so content is now a list of empty return values for all items. I thought the solution would be to include a return value in processLine, but Scala doesn't like that:
warning: enclosing method processLine has result type Unit: return value discarded
So how can I modify processLine so that it can be used to create a list of non-empty tuple values in content? how would it be to declare a lambda function with more than one line?
Thanks to helpful info in this thread, I could also have written it with a lambda expression:
var nonLinearTrainingContent = fileContent.map(x=> {
val words = x.split("\\s+")
(words(0), words(2))
})
There are two things that prevent a result being returned:
println returns Unit
Your function defintion is a shorthand for a method returning Unit
This would give you the result you expected:
def processLine(line: String) : (String,String) = {
val words = line.split("\\s+")
val result = (words(0), words(1))
println(result)
result
}
As asked the same expressed as a function:
val processLineFun : String => (String, String) = line => {
val words = line.split("\\s+")
val result = (words(0), words(1))
println(result)
result
}
Make the tuple (words(0), words(1)) the last line of processLine function:
def processLine(line: String) = {
val words = line.split("\\s+")
println((words(0), words(1)))
(words(0), words(1))
}
Edit: use curly braces for multiline lambda function or separate operators with ';' for one-line lambda
Edit2: fixed return type

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Accessing Scala Parser regular expression match data

I wondering if it's possible to get the MatchData generated from the matching regular expression in the grammar below.
object DateParser extends JavaTokenParsers {
....
val dateLiteral = """(\d{4}[-/])?(\d\d[-/])?(\d\d)""".r ^^ {
... get MatchData
}
}
One option of course is to perform the match again inside the block, but since the RegexParser has already performed the match I'm hoping that it passes the MatchData to the block, or stores it?
Here is the implicit definition that converts your Regex into a Parser:
/** A parser that matches a regex string */
implicit def regex(r: Regex): Parser[String] = new Parser[String] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
(r findPrefixMatchOf (source.subSequence(start, source.length))) match {
case Some(matched) =>
Success(source.subSequence(start, start + matched.end).toString,
in.drop(start + matched.end - offset))
case None =>
Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
}
}
}
Just adapt it:
object X extends RegexParsers {
/** A parser that matches a regex string and returns the Match */
def regexMatch(r: Regex): Parser[Regex.Match] = new Parser[Regex.Match] {
def apply(in: Input) = {
val source = in.source
val offset = in.offset
val start = handleWhiteSpace(source, offset)
(r findPrefixMatchOf (source.subSequence(start, source.length))) match {
case Some(matched) =>
Success(matched,
in.drop(start + matched.end - offset))
case None =>
Failure("string matching regex `"+r+"' expected but `"+in.first+"' found", in.drop(start - offset))
}
}
}
val t = regexMatch("""(\d\d)/(\d\d)/(\d\d\d\d)""".r) ^^ { case m => (m.group(1), m.group(2), m.group(3)) }
}
Example:
scala> X.parseAll(X.t, "23/03/1971")
res8: X.ParseResult[(String, String, String)] = [1.11] parsed: (23,03,1971)
No, you can't do this. If you look at the definition of the Parser used when you convert a regex to a Parser, it throws away all context and just returns the full matched string:
http://lampsvn.epfl.ch/trac/scala/browser/scala/tags/R_2_7_7_final/src/library/scala/util/parsing/combinator/RegexParsers.scala?view=markup#L55
You have a couple of other options, though:
break up your parser into several smaller parsers (for the tokens you actually want to extract)
define a custom parser that extracts the values you want and returns a domain object instead of a string
The first would look like
val separator = "-" | "/"
val year = ("""\d{4}"""r) <~ separator
val month = ("""\d\d"""r) <~ separator
val day = """\d\d"""r
val date = ((year?) ~ (month?) ~ day) map {
case year ~ month ~ day =>
(year.getOrElse("2009"), month.getOrElse("11"), day)
}
The <~ means "require these two tokens together, but only give me the result of the first one.
The ~ means "require these two tokens together and tie them together in a pattern-matchable ~ object.
The ? means that the parser is optional and will return an Option.
The .getOrElse bit provides a default value for when the parser didn't define a value.
When a Regex is used in a RegexParsers instance, the implicit def regex(Regex): Parser[String] in RegexParsers is used to appoly that Regex to the input. The Match instance yielded upon successful application of the RE at the current input is used to construct a Success in the regex() method, but only its "end" value is used, so any captured sub-matches are discarded by the time that method returns.
As it stands (in the 2.7 source I looked at), you're out of luck, I believe.
I ran into a similar issue using scala 2.8.1 and trying to parse input of the form "name:value" using the RegexParsers class:
package scalucene.query
import scala.util.matching.Regex
import scala.util.parsing.combinator._
object QueryParser extends RegexParsers {
override def skipWhitespace = false
private def quoted = regex(new Regex("\"[^\"]+"))
private def colon = regex(new Regex(":"))
private def word = regex(new Regex("\\w+"))
private def fielded = (regex(new Regex("[^:]+")) <~ colon) ~ word
private def term = (fielded | word | quoted)
def parseItem(str: String) = parse(term, str)
}
It seems that you can grab the matched groups after parsing like this:
QueryParser.parseItem("nameExample:valueExample") match {
case QueryParser.Success(result:scala.util.parsing.combinator.Parsers$$tilde, _) => {
println("Name: " + result.productElement(0) + " value: " + result.productElement(1))
}
}