Scala - how to print case classes like (pretty printed) tree - scala

I'm making a parser with Scala Combinators. It is awesome. What I end up with is a long list of entagled case classes, like: ClassDecl(Complex,List(VarDecl(Real,float), VarDecl(Imag,float))), just 100x longer. I was wondering if there is a good way to print case classes like these in a tree-like fashion so that it's easier to read..? (or some other form of Pretty Print)
ClassDecl
name = Complex
fields =
- VarDecl
name = Real
type = float
- VarDecl
name = Imag
type = float
^ I want to end up with something like this
edit: Bonus question
Is there also a way to show the name of the parameter..? Like: ClassDecl(name=Complex, fields=List( ... ) ?

Check out a small extensions library named sext. It exports these two functions exactly for purposes like that.
Here's how it can be used for your example:
object Demo extends App {
import sext._
case class ClassDecl( kind : Kind, list : List[ VarDecl ] )
sealed trait Kind
case object Complex extends Kind
case class VarDecl( a : Int, b : String )
val data = ClassDecl(Complex,List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
println("treeString output:\n")
println(data.treeString)
println()
println("valueTreeString output:\n")
println(data.valueTreeString)
}
Following is the output of this program:
treeString output:
ClassDecl:
- Complex
- List:
| - VarDecl:
| | - 1
| | - abcd
| - VarDecl:
| | - 2
| | - efgh
valueTreeString output:
- kind:
- list:
| - - a:
| | | 1
| | - b:
| | | abcd
| - - a:
| | | 2
| | - b:
| | | efgh

Starting Scala 2.13, case classes (which are an implementation of Product) are now provided with a productElementNames method which returns an iterator over their field's names.
Combined with Product::productIterator which provides the values of a case class, we have a simple way to pretty print case classes without requiring reflection:
def pprint(obj: Any, depth: Int = 0, paramName: Option[String] = None): Unit = {
val indent = " " * depth
val prettyName = paramName.fold("")(x => s"$x: ")
val ptype = obj match { case _: Iterable[Any] => "" case obj: Product => obj.productPrefix case _ => obj.toString }
println(s"$indent$prettyName$ptype")
obj match {
case seq: Iterable[Any] =>
seq.foreach(pprint(_, depth + 1))
case obj: Product =>
(obj.productIterator zip obj.productElementNames)
.foreach { case (subObj, paramName) => pprint(subObj, depth + 1, Some(paramName)) }
case _ =>
}
}
which for your specific scenario:
// sealed trait Kind
// case object Complex extends Kind
// case class VarDecl(a: Int, b: String)
// case class ClassDecl(kind: Kind, decls: List[VarDecl])
val data = ClassDecl(Complex, List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
pprint(data)
produces:
ClassDecl
kind: Complex
decls:
VarDecl
a: 1
b: abcd
VarDecl
a: 2
b: efgh

Use the com.lihaoyi.pprint library.
libraryDependencies += "com.lihaoyi" %% "pprint" % "0.4.1"
val data = ...
val str = pprint.tokenize(data).mkString
println(str)
you can also configure width, height, indent and colors:
pprint.tokenize(data, width = 80).mkString
Docs: https://github.com/com-lihaoyi/PPrint

Here's my solution which greatly improves how http://www.lihaoyi.com/PPrint/ handles the case-classes (see https://github.com/lihaoyi/PPrint/issues/4 ).
e.g. it prints this:
for such a usage:
pprint2 = pprint.copy(additionalHandlers = pprintAdditionalHandlers)
case class Author(firstName: String, lastName: String)
case class Book(isbn: String, author: Author)
val b = Book("978-0486282114", Author("first", "last"))
pprint2.pprintln(b)
code:
import pprint.{PPrinter, Tree, Util}
object PPrintUtils {
// in scala 2.13 this would be even simpler/cleaner due to added product.productElementNames
protected def caseClassToMap(cc: Product): Map[String, Any] = {
val fieldValues = cc.productIterator.toSet
val fields = cc.getClass.getDeclaredFields.toSeq
.filterNot(f => f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
fields.map { f =>
f.setAccessible(true)
f.getName -> f.get(cc)
}.filter { case (k, v) => fieldValues.contains(v) }
.toMap
}
var pprint2: PPrinter = _
protected def pprintAdditionalHandlers: PartialFunction[Any, Tree] = {
case x: Product =>
val className = x.getClass.getName
// see source code for pprint.treeify()
val shouldNotPrettifyCaseClass = x.productArity == 0 || (x.productArity == 2 && Util.isOperator(x.productPrefix)) || className.startsWith(pprint.tuplePrefix) || className == "scala.Some"
if (shouldNotPrettifyCaseClass)
pprint.treeify(x)
else {
val fieldMap = caseClassToMap(x)
pprint.Tree.Apply(
x.productPrefix,
fieldMap.iterator.flatMap { case (k, v) =>
val prettyValue: Tree = pprintAdditionalHandlers.lift(v).getOrElse(pprint2.treeify(v))
Seq(pprint.Tree.Infix(Tree.Literal(k), "=", prettyValue))
}
)
}
}
pprint2 = pprint.copy(additionalHandlers = pprintAdditionalHandlers)
}
// usage
pprint2.println(SomeFancyObjectWithNestedCaseClasses(...))

import java.lang.reflect.Field
...
/**
* Pretty prints case classes with field names.
* Handles sequences and arrays of such values.
* Ideally, one could take the output and paste it into source code and have it compile.
*/
def prettyPrint(a: Any): String = {
// Recursively get all the fields; this will grab vals declared in parents of case classes.
def getFields(cls: Class[_]): List[Field] =
Option(cls.getSuperclass).map(getFields).getOrElse(Nil) ++
cls.getDeclaredFields.toList.filterNot(f =>
f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
a match {
// Make Strings look similar to their literal form.
case s: String =>
'"' + Seq("\n" -> "\\n", "\r" -> "\\r", "\t" -> "\\t", "\"" -> "\\\"", "\\" -> "\\\\").foldLeft(s) {
case (acc, (c, r)) => acc.replace(c, r) } + '"'
case xs: Seq[_] =>
xs.map(prettyPrint).toString
case xs: Array[_] =>
s"Array(${xs.map(prettyPrint) mkString ", "})"
// This covers case classes.
case p: Product =>
s"${p.productPrefix}(${
(getFields(p.getClass) map { f =>
f setAccessible true
s"${f.getName} = ${prettyPrint(f.get(p))}"
}) mkString ", "
})"
// General objects and primitives end up here.
case q =>
Option(q).map(_.toString).getOrElse("¡null!")
}
}

Just like parser combinators, Scala already contains pretty printer combinators in the standard library. (note: this library is deprecated as of Scala 2.11. A similar pretty printing library is a part of kiama open source project).
You are not saying it plainly in your question if you need the solution that does "reflection" or you'd like to build the printer explicitly. (though your "bonus question" hints you probably want "reflective" solution)
Anyway, in the case you'd like to develop simple pretty printer using plain Scala library, here it is. The following code is REPLable.
case class VarDecl(name: String, `type`: String)
case class ClassDecl(name: String, fields: List[VarDecl])
import scala.text._
import Document._
def varDoc(x: VarDecl) =
nest(4, text("- VarDecl") :/:
group("name = " :: text(x.name)) :/:
group("type = " :: text(x.`type`))
)
def classDoc(x: ClassDecl) = {
val docs = ((empty:Document) /: x.fields) { (d, f) => varDoc(f) :/: d }
nest(2, text("ClassDecl") :/:
group("name = " :: text(x.name)) :/:
group("fields =" :/: docs))
}
def prettyPrint(d: Document) = {
val writer = new java.io.StringWriter
d.format(1, writer)
writer.toString
}
prettyPrint(classDoc(
ClassDecl("Complex", VarDecl("Real","float") :: VarDecl("Imag","float") :: Nil)
))
Bonus question: wrap the printers into type classes for even greater composability.

The nicest, most concise "out-of-the" box experience I've found is with the Kiama pretty printing library. It doesn't print member names without using additional combinators, but with only import org.kiama.output.PrettyPrinter._; pretty(any(data)) you have a great start:
case class ClassDecl( kind : Kind, list : List[ VarDecl ] )
sealed trait Kind
case object Complex extends Kind
case class VarDecl( a : Int, b : String )
val data = ClassDecl(Complex,List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
import org.kiama.output.PrettyPrinter._
// `w` is the wrapping width. `1` forces wrapping all components.
pretty(any(data), w=1)
Produces:
ClassDecl (
Complex (),
List (
VarDecl (
1,
"abcd"),
VarDecl (
2,
"efgh")))
Note that this is just the most basic example. Kiama PrettyPrinter is an extremely powerful library with a rich set of combinators specifically designed for intelligent spacing, line wrapping, nesting, and grouping. It's very easy to tweak to suit your needs. As of this posting, it's available in SBT with:
libraryDependencies += "com.googlecode.kiama" %% "kiama" % "1.8.0"

Using reflection
import scala.reflect.ClassTag
import scala.reflect.runtime.universe._
object CaseClassBeautifier {
def getCaseAccessors[T: TypeTag] = typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
def nice[T:TypeTag](x: T)(implicit classTag: ClassTag[T]) : String = {
val instance = x.asInstanceOf[T]
val mirror = runtimeMirror(instance.getClass.getClassLoader)
val accessors = getCaseAccessors[T]
var res = List.empty[String]
accessors.foreach { z ⇒
val instanceMirror = mirror.reflect(instance)
val fieldMirror = instanceMirror.reflectField(z.asTerm)
val s = s"${z.name} = ${fieldMirror.get}"
res = s :: res
}
val beautified = x.getClass.getSimpleName + "(" + res.mkString(", ") + ")"
beautified
}
}

This is a shamless copy paste of #F. P Freely, but
I've added an indentation feature
slight modifications so that the output will be of correct Scala style (and will compile for all primative types)
Fixed string literal bug
Added support for java.sql.Timestamp (as I use this with Spark a lot)
Tada!
// Recursively get all the fields; this will grab vals declared in parents of case classes.
def getFields(cls: Class[_]): List[Field] =
Option(cls.getSuperclass).map(getFields).getOrElse(Nil) ++
cls.getDeclaredFields.toList.filterNot(f =>
f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
// FIXME fix bug where indent seems to increase too much
def prettyfy(a: Any, indentSize: Int = 0): String = {
val indent = List.fill(indentSize)(" ").mkString
val newIndentSize = indentSize + 2
(a match {
// Make Strings look similar to their literal form.
case string: String =>
val conversionMap = Map('\n' -> "\\n", '\r' -> "\\r", '\t' -> "\\t", '\"' -> "\\\"", '\\' -> "\\\\")
string.map(c => conversionMap.getOrElse(c, c)).mkString("\"", "", "\"")
case xs: Seq[_] =>
xs.map(prettyfy(_, newIndentSize)).toString
case xs: Array[_] =>
s"Array(${xs.map(prettyfy(_, newIndentSize)).mkString(", ")})"
case map: Map[_, _] =>
s"Map(\n" + map.map {
case (key, value) => " " + prettyfy(key, newIndentSize) + " -> " + prettyfy(value, newIndentSize)
}.mkString(",\n") + "\n)"
case None => "None"
case Some(x) => "Some(" + prettyfy(x, newIndentSize) + ")"
case timestamp: Timestamp => "new Timestamp(" + timestamp.getTime + "L)"
case p: Product =>
s"${p.productPrefix}(\n${
getFields(p.getClass)
.map { f =>
f.setAccessible(true)
s" ${f.getName} = ${prettyfy(f.get(p), newIndentSize)}"
}
.mkString(",\n")
}\n)"
// General objects and primitives end up here.
case q =>
Option(q).map(_.toString).getOrElse("null")
})
.split("\n", -1).mkString("\n" + indent)
}
E.g.
case class Foo(bar: String, bob: Int)
case class Alice(foo: Foo, opt: Option[String], opt2: Option[String])
scala> prettyPrint(Alice(Foo("hello world", 10), Some("asdf"), None))
res6: String =
Alice(
foo = Foo(
bar = "hello world",
bob = 10
),
opt = Some("asdf"),
opt2 = None
)

If you use Apache Spark, you can use the following method to print your case classes :
def prettyPrint[T <: Product : scala.reflect.runtime.universe.TypeTag](c:T) = {
import play.api.libs.json.Json
println(Json.prettyPrint(Json.parse(Seq(c).toDS().toJSON.head)))
}
This gives a nicely formatted JSON representation of your case class instance. Make sure sparkSession.implicits._ is imported
example:
case class Adress(country:String,city:String,zip:Int,street:String)
case class Person(name:String,age:Int,adress:Adress)
val person = Person("Peter",36,Adress("Switzerland","Zürich",9876,"Bahnhofstrasse 69"))
prettyPrint(person)
gives :
{
"name" : "Peter",
"age" : 36,
"adress" : {
"country" : "Switzerland",
"city" : "Zürich",
"zip" : 9876,
"street" : "Bahnhofstrasse 69"
}
}

I would suggest using the same print that is used in the AssertEquals of your test framework of choice. I was using Scalameta and munit.Assertions.munitPrint(clue: => Any): String does the trick. I can pass nested classes to it and see the whole tree with the proper indentation.

Related

Representing ad-hoc operator precedence in Scala Parser

So I'm trying to write a parser specifically for the arithmetic fragment of a programming language I'm playing with, using scala RegexParsers.
As it stands, my top-level expression parser is of the form:
parser: Parser[Exp] = binAppExp | otherKindsOfParserLike | lval | int
It accepts lvals (things like "a.b, a.b[c.d], a[b], {record=expression, like=this}" just fine. Now, I'd like to enable expressions like "1 + b / c = d", but potentially with (source language, not Scala) compile-time user-defined operators.
My initial thought was, if I encode the operations recursively and numerically by precedence, then I could add higher precedences ad-hoc, and each level of precedence could parse consuming lower-precedence sub-terms on the right-side of the operation expression. So, I'm trying to build a toy of that idea with just some fairly common operators.
So I'd expect "1 * 2+1" to parse into something like Call(*, Seq(1, Call(+ Seq(2,1)))), where case class Call(functionName: String, args: Seq[Exp]) extends Exp.
Instead though, it parses as IntExp(1).
Is there a reason that this can't work (is it left-recursive in a way I'm missing? If so, I'm sure there's something else wrong, or it'd never terminate, right?), or is it just plain wrong for some other reason?
def binAppExp: Parser[Exp] = {
//assume a registry of operations
val ops = Map(
(7, Set("*", "/")),
(6, Set("-", "+")),
(4, Set("=", "!=", ">", "<", ">=", "<=")),
(3, Set("&")),
(2, Set("|"))
)
//relevant ops for a level of precedence
def opsWithPrecedence(n: Int): Set[String] = ops.getOrElse(n, Set.empty)
//parse an op with some level of precedence
def opWithPrecedence(n: Int): Parser[String] = ".+".r ^? (
{ case s if opsWithPrecedence(n).contains(s) => s },
{ case s => s"SYMBOL NOT FOUND: $s" }
)
//assuming the parse happens, encode it as an AST representation
def folder(h: Exp, t: LangParser.~[String, Exp]): CallExp =
CallExp(t._1, Seq(h, t._2))
val maxPrecedence: Int = ops.maxBy(_._1)._1
def term: (Int => Parser[Exp]) = {
case 0 => lval | int | notApp | "(" ~> term(maxPrecedence) <~ ")"
case n =>
val lowerTerm = term(n - 1)
lowerTerm ~ rep(opWithPrecedence(n) ~ lowerTerm) ^^ {
case h ~ ts => ts.foldLeft(h)(folder)
}
}
term(maxPrecedence)
}
Okay, so there was nothing inherently impossible with what I was trying to do, it was just wrong in the details.
The core idea is just: maintain a mapping from level of precedence to operators/parsers, and recursively look for parses based on that table. If you allow parenthetical expressions, just nest a call to your most precedent possible parser within the call to the parenthetical terms' parser.
Just in case anyone else ever wants to do something like this, here's code for a set of arithmetic/logical operators, heavily commented to relate it to the above:
def opExp: Parser[Exp] = {
sealed trait Assoc
val ops = Map(
(1, Set("*", "/")),
(2, Set("-", "+")),
(3, Set("=", "!=", ">", "<", ">=", "<=")),
(4, Set("&")),
(5, Set("|"))
)
def opsWithPrecedence(n: Int): Set[String] = ops.getOrElse(n, Set.empty)
/* before, this was trying to match the remainder of the expression,
so something like `3 - 2` would parse the Int(3),
and try to pass "- 2" as an operator to the op parser.
RegexParsers has an implicit def "literal : String => SubclassOfParser[String]",
that I'm using explicitly here.
*/
def opWithPrecedence(n: Int): Parser[String] = {
val ops = opsWithPrecedence(n)
if (ops.size > 1) {
ops.map(literal).fold (literal(ops.head)) {
case (l1, l2) => l1 | l2
}
} else if (ops.size == 1) {
literal(ops.head)
} else {
failure(s"No Ops for Precedence $n")
}
}
def folder(h: Exp, t: TigerParser.~[String, Exp]): CallExp = CallExp(t._1, Seq(h, t._2))
val maxPrecedence: Int = ops.maxBy(_._1)._1
def term: (Int => Parser[Exp]) = {
case 0 => lval | int | "(" ~> { term(maxPrecedence) } <~ ")"
case n if n > 0 =>
val lowerTerm = term(n - 1)
lowerTerm ~ rep(opWithPrecedence(n) ~ lowerTerm) ^^ {
case h ~ ts if ts.nonEmpty => ts.foldLeft(h)(folder)
case h ~ _ => h
}
}
term(maxPrecedence)
}

Simplify/DRY up a case statement in Scala for Twirl Templates

So I'm using play Twirl templates (not within play; independent project) and I have some templates that generate some database DDLs. The following works:
if(config.params.showDDL.isSupplied) {
print( BigSenseServer.config.options("dbms") match {
case "mysql" => txt.mysql(
BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName,
BigSenseServer.config.options("dboUser"),
BigSenseServer.config.options("dboPass"),
BigSenseServer.config.options("dbUser"),
BigSenseServer.config.options("dbPass")
)
case "pgsql" => txt.pgsql(
BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName,
BigSenseServer.config.options("dboUser"),
BigSenseServer.config.options("dboPass"),
BigSenseServer.config.options("dbUser"),
BigSenseServer.config.options("dbPass")
)
case "mssql" => txt.mssql$.MODULE$(
BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName,
BigSenseServer.config.options("dboUser"),
BigSenseServer.config.options("dboPass"),
BigSenseServer.config.options("dbUser"),
BigSenseServer.config.options("dbPass")
)
})
System.exit(0)
}
But I have a lot of repeated statements. If I try to assign the case to a variable and use the $.MODULE$ trick, I get an error saying my variable doesn't take parameters:
val b = BigSenseServer.config.options("dbms") match {
case "mysql" => txt.mysql$.MODULE$
case "pgsql" => txt.pgsql$.MODULE$
case "mssql" => txt.mssql$.MODULE$
}
b("string1","string2","string3","string4","string5","string6")
and the error:
BigSense/src/main/scala/io/bigsense/server/BigSenseServer.scala:32: play.twirl.api.BaseScalaTemplate[T,F] with play.twirl.api.Template6[A,B,C,D,E,F,Result] does not take parameters
What's the best way to simplify this Scala code?
EDIT: Final Solution using a combination of the answers below
The answers below suggest creating factory classes, but I really want to avoid that since I already have the Twirl generated template object. The partially applied functions gave me a better understanding of how to achieve this. Turns out all I needed to do was to pick the apply methods and to eta-expand these; if necessary in combination with partial function application. The following works great:
if(config.params.showDDL.isSupplied) {
print((config.options("dbms") match {
case "pgsql" =>
txt.pgsql.apply _
case "mssql" =>
txt.mssql.apply _
case "mysql" =>
txt.mysql.apply(InetAddress.getLocalHost().getCanonicalHostName,
_:String, _:String, _:String,_:String, _:String)
})(
config.options("dbDatabase"),
config.options("dboUser"),
config.options("dboPass"),
config.options("dbUser"),
config.options("dbPass")
))
System.exit(0)
}
You can try to use eta-expansion and partially applied functions.
Given a factory with some methods:
class Factory {
def mysql(i: Int, s: String) = s"x: $i/$s"
def pgsql(i: Int, s: String) = s"y: $i/$s"
def mssql(i: Int, j: Int, s: String) = s"z: $i/$j/$s"
}
You can abstract over the methods like this:
val factory = new Factory()
// Arguments required by all factory methods
val i = 5
val s = "Hello"
Seq("mysql", "pgsql", "mssql").foreach {
name =>
val f = name match {
case "mysql" =>
// Eta-expand: Convert method into function
factory.mysql _
case "pgsql" =>
factory.pgsql _
case "mssql" =>
// Argument for only one factory method
val j = 10
// Eta-expand, then apply function partially
factory.mssql(_ :Int, j, _: String)
}
// Fill in common arguments into the new function
val result = f(i, s)
println(name + " -> " + result)
}
As you can see in the "mssql" case, the arguments may even differ; yet the common arguments only need to be passed once. The foreach loop is just to test each case, the code in the body shows how to partially apply a function.
You can try to do this by using tupled() to create tupled version of the function.
object X {
def a(x : Int, y : Int, z : Int) = "A" + x + y + z
def b(x : Int, y : Int, z : Int) = "B" + x + y + z
def c(x : Int, y : Int, z : Int) = "C" + x + y + z
}
val selectedFunc = X.a _
selectedFunc.tupled((1, 2, 3)) //returns A123
More specifically, you would store your parameters in a tuple:
val params = (BigSenseServer.config.options("dbDatabase"),
InetAddress.getLocalHost().getCanonicalHostName) //etc.
and then in your match statement:
case "mysql" => (txt.mysql _).tupled(params)

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Rewrite string modifications more functional

I'm reading lines from a file
for (line <- Source.fromFile("test.txt").getLines) {
....
}
I basically want to get a list of paragraphs in the end. If a line is empty, that starts as a new paragraph, and I might want to parse some keyword - value pairs in the future.
The text file contains a list of entries like this (or something similar, like an Ini file)
User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.
User=....
And I basically want to have a List[Project] where Project looks something like
class Project (val User: String, val Name:String, val Desc: String) {}
And the Description is that big chunk of text that doesn't start with a <keyword>=, but can stretch over any number of lines.
I know how to do this in an iterative style. Just do a list of checks for the keywords, and populate an instance of a class, and add it to a list to return later.
But I think it should be possible to do this in proper functional style, possibly with match case, yield and recursion, resulting in a list of objects that have the fields User, Project and so on. The class used is known, as are all the keywords, and the file format is not set in stone either. I'm mostly trying to learn better functional style.
You're obviously parsing something, so it might be the time to use... a parser!
Since your language seems to treat line breaks as significant, you will need to refer to this question to tell the parser so.
Apart from that, a rather simple implementation would be
import scala.util.parsing.combinator.RegexParsers
case class Project(user: String, name: String, description: String)
object ProjectParser extends RegexParsers {
override val whiteSpace = """[ \t]+""".r
def eol : Parser[String] = """\r?\n""".r
def user: Parser[String] = "User=" ~> """[^\n]*""".r <~ eol
def name: Parser[String] = "Project=" ~> """[^\n]*""".r <~ eol
def description: Parser[String] = repsep("""[^\n]+""".r, eol) ^^ { case l => l.mkString("\n") }
def project: Parser[Project] = user ~ name ~ description ^^ { case a ~ b ~ c => Project(a, b, c) }
def projects: Parser[List[Project]] = repsep(project,eol ~ eol)
}
And how to use it:
val sample = """User=foo1
Project=bar1
desc1
desc2
desc3
User=foo
Project=bar
desc4 desc5 desc6
desc7 desc8 desc9"""
import scala.util.parsing.input._
val reader = new CharSequenceReader(sample)
val res = ProjectParser.parseAll(ProjectParser.projects, reader)
if(res.successful) {
print("Found projects: " + res.get)
} else {
print(res)
}
Another possible implementation (since this parser is rather simple), using recursion:
import scala.io.Source
case class Project(user: String, name: String, desc: String)
#scala.annotation.tailrec
def parse(source: Iterator[String], list: List[Project] = Nil): List[Project] = {
val emptyProject = Project("", "", "")
#scala.annotation.tailrec
def parseProject(project: Option[Project] = None): Option[Project] = {
if(source.hasNext) {
val line = source.next
if(!line.isEmpty) {
val splitted = line.span(_ != '=')
parseProject(splitted match {
case (h, t) if h == "User" => project.orElse(Some(emptyProject)).map(_.copy(user = t.drop(1)))
case (h, t) if h == "Project" => project.orElse(Some(emptyProject)).map(_.copy(name = t.drop(1)))
case _ => project.orElse(Some(emptyProject)).map(project => project.copy(desc = (if(project.desc.isEmpty) "" else project.desc ++ "\n") ++ line))
})
} else project
} else project
}
if(source.hasNext) {
parse(source, parseProject().map(_ :: list).getOrElse(list))
} else list.reverse
}
And the test:
object Test {
def source = Source.fromString("""User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.
User=Plop
Project=SO
Some desc""")
def test = println(parse(source.getLines))
}
Which gives:
List(Project(Hans,Blow up the moon,The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.), Project(Plop,SO,Some desc))
To answer your question without also tackling keyword parsing, fold over the lines and aggregate lines unless it's an empty one, in which case you start a new empty paragraph.
lines.foldLeft(List("")) { (l, x) =>
if (x.isEmpty) "" :: l else (l.head + "\n" + x) :: l.tail
} reverse
You'll notice this has some wrinkles in how it handles zero lines, and multiple and trailing empty lines. Adapt to your needs. Also if you are anal about string concatenations you can collect them in a nested list and flatten in the end (using .map(_.mkString)), this is just to showcase the basic technique of folding a sequence not to a scalar but to a new sequence.
This builds a list in reverse order because list prepend (::) is more efficient than appending to l in each step.
You're obviously building something, so you might want to try... a builder!
Like Jürgen, my first thought was to fold, where you're accumulating a result.
A mutable.Builder does the accumulation mutably, with a collection.generic.CanBuildFrom to indicate the builder to use to make a target collection from a source collection. You keep the mutable thing around just long enough to get a result. So that's my plug for localized mutability. Lest one assume that the path from List[String] to List[Project] is immutable.
To the other fine answers (the ones with non-negative appreciation ratings), I would add that functional style means functional decomposition, and usually small functions.
If you're not using regex parsers, don't neglect regexes in your pattern matches.
And try to spare the dots. In fact, I believe that tomorrow is a Spare the Dots Day, and people with sensitivity to dots are advised to remain indoors.
case class Project(user: String, name: String, description: String)
trait Sample {
val sample = """
|User=Hans
|Project=Blow up the moon
|The slugs are going to eat the mustard. // multiline possible!
|They are sneaky bastards, those slugs.
|
|User=Bob
|I haven't thought up a project name yet.
|
|User=Greta
|Project=Burn the witch
|It's necessary to escape from the witch before
|we blow up the moon. I hope Hans sees it my way.
|Once we burn the bitch, I mean witch, we can
|wreak whatever havoc pleases us.
|""".stripMargin
}
object Test extends App with Sample {
val kv = "(.*?)=(.*)".r
def nonnully(s: String) = if (s == null) "" else s + " "
val empty = Project(null, null, null)
val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) { (acc, line) =>
val (sofar, cur) = acc
line match {
case kv("User", u) => (sofar, cur copy (user = u))
case kv("Project", n) => (sofar, cur copy (name = n))
case kv(k, _) => sys error s"Bad keyword $k"
case x if x.nonEmpty => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
case _ if cur != empty => (cur :: sofar, empty)
case _ => (sofar, empty)
}
}
val ps = if (dummy == empty) res.reverse else (dummy :: res).reverse
Console println ps
}
The match can be mashed this way, too:
val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) {
case ((sofar, cur), kv("User", u)) => (sofar, cur copy (user = u))
case ((sofar, cur), kv("Project", n)) => (sofar, cur copy (name = n))
case ((sofar, cur), kv(k, _)) => sys error s"Bad keyword $k"
case ((sofar, cur), x) if x.nonEmpty => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
case ((sofar, cur), _) if cur != empty => (cur :: sofar, empty)
case ((sofar, cur), _) => (sofar, empty)
}
Before the fold, it seemed simpler to do paragraphs first. Is that imperative thinking?
object Test0 extends App with Sample {
def grafs(ss: Iterator[String]): List[List[String]] = {
val (g, rest) = ss dropWhile (_.isEmpty) span (_.nonEmpty)
val others = if (rest.nonEmpty) grafs(rest) else Nil
g.toList :: others
}
def toProject(ss: List[String]): Project = {
var p = Project("", "", "")
for (line <- ss; parts = line split '=') parts match {
case Array("User", u) => p = p.copy(user = u)
case Array("Project", n) => p = p.copy(name = n)
case Array(k, _) => sys error s"Bad keyword $k"
case Array(text) => p = p.copy(description = s"${p.description} $text")
}
p
}
val ps = grafs(sample.lines) map toProject
Console println ps
}
class Project (val User: String, val Name:String, val Desc: String) {}
object Project {
def apply(str: String): Project = {
val user = somehowFetchUserName(str)
val name = somehowFetchProjectName(str)
val desc = somehowFetchDescription(str)
new Project(user, name, desc)
}
}
val contents: Array[String] = Source.fromFile("test.txt").mkString.split("\\n\\n")
val list = contents map(Project(_))
will end up with the list of projects.

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}