Rewrite string modifications more functional - scala

I'm reading lines from a file
for (line <- Source.fromFile("test.txt").getLines) {
....
}
I basically want to get a list of paragraphs in the end. If a line is empty, that starts as a new paragraph, and I might want to parse some keyword - value pairs in the future.
The text file contains a list of entries like this (or something similar, like an Ini file)
User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.
User=....
And I basically want to have a List[Project] where Project looks something like
class Project (val User: String, val Name:String, val Desc: String) {}
And the Description is that big chunk of text that doesn't start with a <keyword>=, but can stretch over any number of lines.
I know how to do this in an iterative style. Just do a list of checks for the keywords, and populate an instance of a class, and add it to a list to return later.
But I think it should be possible to do this in proper functional style, possibly with match case, yield and recursion, resulting in a list of objects that have the fields User, Project and so on. The class used is known, as are all the keywords, and the file format is not set in stone either. I'm mostly trying to learn better functional style.

You're obviously parsing something, so it might be the time to use... a parser!
Since your language seems to treat line breaks as significant, you will need to refer to this question to tell the parser so.
Apart from that, a rather simple implementation would be
import scala.util.parsing.combinator.RegexParsers
case class Project(user: String, name: String, description: String)
object ProjectParser extends RegexParsers {
override val whiteSpace = """[ \t]+""".r
def eol : Parser[String] = """\r?\n""".r
def user: Parser[String] = "User=" ~> """[^\n]*""".r <~ eol
def name: Parser[String] = "Project=" ~> """[^\n]*""".r <~ eol
def description: Parser[String] = repsep("""[^\n]+""".r, eol) ^^ { case l => l.mkString("\n") }
def project: Parser[Project] = user ~ name ~ description ^^ { case a ~ b ~ c => Project(a, b, c) }
def projects: Parser[List[Project]] = repsep(project,eol ~ eol)
}
And how to use it:
val sample = """User=foo1
Project=bar1
desc1
desc2
desc3
User=foo
Project=bar
desc4 desc5 desc6
desc7 desc8 desc9"""
import scala.util.parsing.input._
val reader = new CharSequenceReader(sample)
val res = ProjectParser.parseAll(ProjectParser.projects, reader)
if(res.successful) {
print("Found projects: " + res.get)
} else {
print(res)
}

Another possible implementation (since this parser is rather simple), using recursion:
import scala.io.Source
case class Project(user: String, name: String, desc: String)
#scala.annotation.tailrec
def parse(source: Iterator[String], list: List[Project] = Nil): List[Project] = {
val emptyProject = Project("", "", "")
#scala.annotation.tailrec
def parseProject(project: Option[Project] = None): Option[Project] = {
if(source.hasNext) {
val line = source.next
if(!line.isEmpty) {
val splitted = line.span(_ != '=')
parseProject(splitted match {
case (h, t) if h == "User" => project.orElse(Some(emptyProject)).map(_.copy(user = t.drop(1)))
case (h, t) if h == "Project" => project.orElse(Some(emptyProject)).map(_.copy(name = t.drop(1)))
case _ => project.orElse(Some(emptyProject)).map(project => project.copy(desc = (if(project.desc.isEmpty) "" else project.desc ++ "\n") ++ line))
})
} else project
} else project
}
if(source.hasNext) {
parse(source, parseProject().map(_ :: list).getOrElse(list))
} else list.reverse
}
And the test:
object Test {
def source = Source.fromString("""User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.
User=Plop
Project=SO
Some desc""")
def test = println(parse(source.getLines))
}
Which gives:
List(Project(Hans,Blow up the moon,The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.), Project(Plop,SO,Some desc))

To answer your question without also tackling keyword parsing, fold over the lines and aggregate lines unless it's an empty one, in which case you start a new empty paragraph.
lines.foldLeft(List("")) { (l, x) =>
if (x.isEmpty) "" :: l else (l.head + "\n" + x) :: l.tail
} reverse
You'll notice this has some wrinkles in how it handles zero lines, and multiple and trailing empty lines. Adapt to your needs. Also if you are anal about string concatenations you can collect them in a nested list and flatten in the end (using .map(_.mkString)), this is just to showcase the basic technique of folding a sequence not to a scalar but to a new sequence.
This builds a list in reverse order because list prepend (::) is more efficient than appending to l in each step.

You're obviously building something, so you might want to try... a builder!
Like Jürgen, my first thought was to fold, where you're accumulating a result.
A mutable.Builder does the accumulation mutably, with a collection.generic.CanBuildFrom to indicate the builder to use to make a target collection from a source collection. You keep the mutable thing around just long enough to get a result. So that's my plug for localized mutability. Lest one assume that the path from List[String] to List[Project] is immutable.
To the other fine answers (the ones with non-negative appreciation ratings), I would add that functional style means functional decomposition, and usually small functions.
If you're not using regex parsers, don't neglect regexes in your pattern matches.
And try to spare the dots. In fact, I believe that tomorrow is a Spare the Dots Day, and people with sensitivity to dots are advised to remain indoors.
case class Project(user: String, name: String, description: String)
trait Sample {
val sample = """
|User=Hans
|Project=Blow up the moon
|The slugs are going to eat the mustard. // multiline possible!
|They are sneaky bastards, those slugs.
|
|User=Bob
|I haven't thought up a project name yet.
|
|User=Greta
|Project=Burn the witch
|It's necessary to escape from the witch before
|we blow up the moon. I hope Hans sees it my way.
|Once we burn the bitch, I mean witch, we can
|wreak whatever havoc pleases us.
|""".stripMargin
}
object Test extends App with Sample {
val kv = "(.*?)=(.*)".r
def nonnully(s: String) = if (s == null) "" else s + " "
val empty = Project(null, null, null)
val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) { (acc, line) =>
val (sofar, cur) = acc
line match {
case kv("User", u) => (sofar, cur copy (user = u))
case kv("Project", n) => (sofar, cur copy (name = n))
case kv(k, _) => sys error s"Bad keyword $k"
case x if x.nonEmpty => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
case _ if cur != empty => (cur :: sofar, empty)
case _ => (sofar, empty)
}
}
val ps = if (dummy == empty) res.reverse else (dummy :: res).reverse
Console println ps
}
The match can be mashed this way, too:
val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) {
case ((sofar, cur), kv("User", u)) => (sofar, cur copy (user = u))
case ((sofar, cur), kv("Project", n)) => (sofar, cur copy (name = n))
case ((sofar, cur), kv(k, _)) => sys error s"Bad keyword $k"
case ((sofar, cur), x) if x.nonEmpty => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
case ((sofar, cur), _) if cur != empty => (cur :: sofar, empty)
case ((sofar, cur), _) => (sofar, empty)
}
Before the fold, it seemed simpler to do paragraphs first. Is that imperative thinking?
object Test0 extends App with Sample {
def grafs(ss: Iterator[String]): List[List[String]] = {
val (g, rest) = ss dropWhile (_.isEmpty) span (_.nonEmpty)
val others = if (rest.nonEmpty) grafs(rest) else Nil
g.toList :: others
}
def toProject(ss: List[String]): Project = {
var p = Project("", "", "")
for (line <- ss; parts = line split '=') parts match {
case Array("User", u) => p = p.copy(user = u)
case Array("Project", n) => p = p.copy(name = n)
case Array(k, _) => sys error s"Bad keyword $k"
case Array(text) => p = p.copy(description = s"${p.description} $text")
}
p
}
val ps = grafs(sample.lines) map toProject
Console println ps
}

class Project (val User: String, val Name:String, val Desc: String) {}
object Project {
def apply(str: String): Project = {
val user = somehowFetchUserName(str)
val name = somehowFetchProjectName(str)
val desc = somehowFetchDescription(str)
new Project(user, name, desc)
}
}
val contents: Array[String] = Source.fromFile("test.txt").mkString.split("\\n\\n")
val list = contents map(Project(_))
will end up with the list of projects.

Related

Simple functionnal way for grouping successive elements? [duplicate]

I'm trying to 'group' a string into segments, I guess this example would explain it more succintly
scala> val str: String = "aaaabbcddeeeeeeffg"
... (do something)
res0: List("aaaa","bb","c","dd","eeeee","ff","g")
I can thnk of a few ways to do this in an imperative style (with vars and stepping through the string to find groups) but I was wondering if any better functional solution could
be attained? I've been looking through the Scala API but there doesn't seem to be something that fits my needs.
Any help would be appreciated
You can split the string recursively with span:
def s(x : String) : List[String] = if(x.size == 0) Nil else {
val (l,r) = x.span(_ == x(0))
l :: s(r)
}
Tail recursive:
#annotation.tailrec def s(x : String, y : List[String] = Nil) : List[String] = {
if(x.size == 0) y.reverse
else {
val (l,r) = x.span(_ == x(0))
s(r, l :: y)
}
}
Seems that all other answers are very concentrated on collection operations. But pure string + regex solution is much simpler:
str split """(?<=(\w))(?!\1)""" toList
In this regex I use positive lookbehind and negative lookahead for the captured char
def group(s: String): List[String] = s match {
case "" => Nil
case s => s.takeWhile(_==s.head) :: group(s.dropWhile(_==s.head))
}
Edit: Tail recursive version:
def group(s: String, result: List[String] = Nil): List[String] = s match {
case "" => result reverse
case s => group(s.dropWhile(_==s.head), s.takeWhile(_==s.head) :: result)
}
can be used just like the other because the second parameter has a default value and thus doesnt have to be supplied.
Make it one-liner:
scala> val str = "aaaabbcddddeeeeefff"
str: java.lang.String = aaaabbcddddeeeeefff
scala> str.groupBy(identity).map(_._2)
res: scala.collection.immutable.Iterable[String] = List(eeeee, fff, aaaa, bb, c, dddd)
UPDATE:
As #Paul mentioned about the order here is updated version:
scala> str.groupBy(identity).toList.sortBy(_._1).map(_._2)
res: List[String] = List(aaaa, bb, c, dddd, eeeee, fff)
You could use some helper functions like this:
val str = "aaaabbcddddeeeeefff"
def zame(chars:List[Char]) = chars.partition(_==chars.head)
def q(chars:List[Char]):List[List[Char]] = chars match {
case Nil => Nil
case rest =>
val (thesame,others) = zame(rest)
thesame :: q(others)
}
q(str.toList) map (_.mkString)
This should do the trick, right? No doubt it can be cleaned up into one-liners even further
A functional* solution using fold:
def group(s : String) : Seq[String] = {
s.tail.foldLeft(Seq(s.head.toString)) { case (carry, elem) =>
if ( carry.last(0) == elem ) {
carry.init :+ (carry.last + elem)
}
else {
carry :+ elem.toString
}
}
}
There is a lot of cost hidden in all those sequence operations performed on strings (via implicit conversion). I guess the real complexity heavily depends on the kind of Seq strings are converted to.
(*) Afaik all/most operations in the collection library depend in iterators, an imho inherently unfunctional concept. But the code looks functional, at least.
Starting Scala 2.13, List is now provided with the unfold builder which can be combined with String::span:
List.unfold("aaaabbaaacdeeffg") {
case "" => None
case rest => Some(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
or alternatively, coupled with Scala 2.13's Option#unless builder:
List.unfold("aaaabbaaacdeeffg") {
rest => Option.unless(rest.isEmpty)(rest.span(_ == rest.head))
}
// List[String] = List("aaaa", "bb", "aaa", "c", "d", "ee", "ff", "g")
Details:
Unfold (def unfold[A, S](init: S)(f: (S) => Option[(A, S)]): List[A]) is based on an internal state (init) which is initialized in our case with "aaaabbaaacdeeffg".
For each iteration, we span (def span(p: (Char) => Boolean): (String, String)) this internal state in order to find the prefix containing the same symbol and produce a (String, String) tuple which contains the prefix and the rest of the string. span is very fortunate in this context as it produces exactly what unfold expects: a tuple containing the next element of the list and the new internal state.
The unfolding stops when the internal state is "" in which case we produce None as expected by unfold to exit.
Edit: Have to read more carefully. Below is no functional code.
Sometimes, a little mutable state helps:
def group(s : String) = {
var tmp = ""
val b = Seq.newBuilder[String]
s.foreach { c =>
if ( tmp != "" && tmp.head != c ) {
b += tmp
tmp = ""
}
tmp += c
}
b += tmp
b.result
}
Runtime O(n) (if segments have at most constant length) and tmp.+= probably creates the most overhead. Use a string builder instead for strict runtime in O(n).
group("aaaabbcddeeeeeeffg")
> Seq[String] = List(aaaa, bb, c, dd, eeeeee, ff, g)
If you want to use scala API you can use the built in function for that:
str.groupBy(c => c).values
Or if you mind it being sorted and in a list:
str.groupBy(c => c).values.toList.sorted

Counting pattern in scala list

My list looks like the following: List(Person,Invite,Invite,Person,Invite,Person...). I am trying to match based on a inviteCountRequired, meaning that the Invite objects following the Person object in the list is variable. What is the best way of doing this? My match code so far looks like this:
aList match {
case List(Person(_,_,_),Invitee(_,_,_),_*) => ...
case _ => ...
}
First stack question, please go easy on me.
Let
val aList = List(Person(1), Invite(2), Invite(3), Person(2), Invite(4), Person(3), Invite(6), Invite(7))
Then index each location in the list and select Person instances,
val persons = (aList zip Stream.from(0)).filter {_._1.isInstanceOf[Person]}
namely, List((Person(1),0), (Person(2),3), (Person(3),5)) . Define then sublists where the lower bound corresponds to a Person instance,
val intervals = persons.map{_._2}.sliding(2,1).toArray
res31: Array[List[Int]] = Array(List(0, 3), List(3, 5))
Construct sublists,
val latest = aList.drop(intervals.last.last) // last Person and Invitees not paired
val associations = intervals.map { case List(pa,pb,_*) => b.slice(pa,pb) } ++ latest
Hence the result looks like
Array(List(Person(1), Invite(2), Invite(3)), List(Person(2), Invite(4)), List(Person(3), Invite(6), Invite(7)))
Now,
associations.map { a =>
val person = a.take(1)
val invitees = a.drop(1)
// ...
}
This approach may be seen as a variable size sliding.
Thanks for your tips. I ended up creating another case class:
case class BallotInvites(person:Person,invites:List[Any])
Then, I populated it from the original list:
def constructBallotList(ballots:List[Any]):List[BallotInvites] ={
ballots.zipWithIndex.collect {
case (iv:Ballot,i) =>{
BallotInvites(iv,
ballots.distinct.takeRight(ballots.distinct.length-(i+1)).takeWhile({
case y:Invitee => true
case y:Person =>true
case y:Ballot => false})
)}
}}
val l = Ballot.constructBallotList(ballots)
Then to count based on inviteCountRequired, I did the following:
val count = l.count(b=>if ((b.invites.count(x => x.isInstanceOf[Person]) / contest.inviteCountRequired)>0) true else false )
I am not sure I understand the domain but you should only need to iterate once to construct a list of person + invites tuple.
sealed trait PorI
case class P(i: Int) extends PorI
case class I(i: Int) extends PorI
val l: List[PorI] = List(P(1), I(1), I(1), P(2), I(2), P(3), P(4), I(4))
val res = l.foldLeft(List.empty[(P, List[I])])({ case (res, t) =>
t match {
case p # P(_) => (p, List.empty[I]) :: res
case i # I(_) => {
val head :: tail = res
(head._1, i :: head._2) :: tail
}
}
})
res // List((P(4),List(I(4))), (P(3),List()), (P(2),List(I(2))), (P(1),List(I(1), I(1))))

Scala - how to print case classes like (pretty printed) tree

I'm making a parser with Scala Combinators. It is awesome. What I end up with is a long list of entagled case classes, like: ClassDecl(Complex,List(VarDecl(Real,float), VarDecl(Imag,float))), just 100x longer. I was wondering if there is a good way to print case classes like these in a tree-like fashion so that it's easier to read..? (or some other form of Pretty Print)
ClassDecl
name = Complex
fields =
- VarDecl
name = Real
type = float
- VarDecl
name = Imag
type = float
^ I want to end up with something like this
edit: Bonus question
Is there also a way to show the name of the parameter..? Like: ClassDecl(name=Complex, fields=List( ... ) ?
Check out a small extensions library named sext. It exports these two functions exactly for purposes like that.
Here's how it can be used for your example:
object Demo extends App {
import sext._
case class ClassDecl( kind : Kind, list : List[ VarDecl ] )
sealed trait Kind
case object Complex extends Kind
case class VarDecl( a : Int, b : String )
val data = ClassDecl(Complex,List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
println("treeString output:\n")
println(data.treeString)
println()
println("valueTreeString output:\n")
println(data.valueTreeString)
}
Following is the output of this program:
treeString output:
ClassDecl:
- Complex
- List:
| - VarDecl:
| | - 1
| | - abcd
| - VarDecl:
| | - 2
| | - efgh
valueTreeString output:
- kind:
- list:
| - - a:
| | | 1
| | - b:
| | | abcd
| - - a:
| | | 2
| | - b:
| | | efgh
Starting Scala 2.13, case classes (which are an implementation of Product) are now provided with a productElementNames method which returns an iterator over their field's names.
Combined with Product::productIterator which provides the values of a case class, we have a simple way to pretty print case classes without requiring reflection:
def pprint(obj: Any, depth: Int = 0, paramName: Option[String] = None): Unit = {
val indent = " " * depth
val prettyName = paramName.fold("")(x => s"$x: ")
val ptype = obj match { case _: Iterable[Any] => "" case obj: Product => obj.productPrefix case _ => obj.toString }
println(s"$indent$prettyName$ptype")
obj match {
case seq: Iterable[Any] =>
seq.foreach(pprint(_, depth + 1))
case obj: Product =>
(obj.productIterator zip obj.productElementNames)
.foreach { case (subObj, paramName) => pprint(subObj, depth + 1, Some(paramName)) }
case _ =>
}
}
which for your specific scenario:
// sealed trait Kind
// case object Complex extends Kind
// case class VarDecl(a: Int, b: String)
// case class ClassDecl(kind: Kind, decls: List[VarDecl])
val data = ClassDecl(Complex, List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
pprint(data)
produces:
ClassDecl
kind: Complex
decls:
VarDecl
a: 1
b: abcd
VarDecl
a: 2
b: efgh
Use the com.lihaoyi.pprint library.
libraryDependencies += "com.lihaoyi" %% "pprint" % "0.4.1"
val data = ...
val str = pprint.tokenize(data).mkString
println(str)
you can also configure width, height, indent and colors:
pprint.tokenize(data, width = 80).mkString
Docs: https://github.com/com-lihaoyi/PPrint
Here's my solution which greatly improves how http://www.lihaoyi.com/PPrint/ handles the case-classes (see https://github.com/lihaoyi/PPrint/issues/4 ).
e.g. it prints this:
for such a usage:
pprint2 = pprint.copy(additionalHandlers = pprintAdditionalHandlers)
case class Author(firstName: String, lastName: String)
case class Book(isbn: String, author: Author)
val b = Book("978-0486282114", Author("first", "last"))
pprint2.pprintln(b)
code:
import pprint.{PPrinter, Tree, Util}
object PPrintUtils {
// in scala 2.13 this would be even simpler/cleaner due to added product.productElementNames
protected def caseClassToMap(cc: Product): Map[String, Any] = {
val fieldValues = cc.productIterator.toSet
val fields = cc.getClass.getDeclaredFields.toSeq
.filterNot(f => f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
fields.map { f =>
f.setAccessible(true)
f.getName -> f.get(cc)
}.filter { case (k, v) => fieldValues.contains(v) }
.toMap
}
var pprint2: PPrinter = _
protected def pprintAdditionalHandlers: PartialFunction[Any, Tree] = {
case x: Product =>
val className = x.getClass.getName
// see source code for pprint.treeify()
val shouldNotPrettifyCaseClass = x.productArity == 0 || (x.productArity == 2 && Util.isOperator(x.productPrefix)) || className.startsWith(pprint.tuplePrefix) || className == "scala.Some"
if (shouldNotPrettifyCaseClass)
pprint.treeify(x)
else {
val fieldMap = caseClassToMap(x)
pprint.Tree.Apply(
x.productPrefix,
fieldMap.iterator.flatMap { case (k, v) =>
val prettyValue: Tree = pprintAdditionalHandlers.lift(v).getOrElse(pprint2.treeify(v))
Seq(pprint.Tree.Infix(Tree.Literal(k), "=", prettyValue))
}
)
}
}
pprint2 = pprint.copy(additionalHandlers = pprintAdditionalHandlers)
}
// usage
pprint2.println(SomeFancyObjectWithNestedCaseClasses(...))
import java.lang.reflect.Field
...
/**
* Pretty prints case classes with field names.
* Handles sequences and arrays of such values.
* Ideally, one could take the output and paste it into source code and have it compile.
*/
def prettyPrint(a: Any): String = {
// Recursively get all the fields; this will grab vals declared in parents of case classes.
def getFields(cls: Class[_]): List[Field] =
Option(cls.getSuperclass).map(getFields).getOrElse(Nil) ++
cls.getDeclaredFields.toList.filterNot(f =>
f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
a match {
// Make Strings look similar to their literal form.
case s: String =>
'"' + Seq("\n" -> "\\n", "\r" -> "\\r", "\t" -> "\\t", "\"" -> "\\\"", "\\" -> "\\\\").foldLeft(s) {
case (acc, (c, r)) => acc.replace(c, r) } + '"'
case xs: Seq[_] =>
xs.map(prettyPrint).toString
case xs: Array[_] =>
s"Array(${xs.map(prettyPrint) mkString ", "})"
// This covers case classes.
case p: Product =>
s"${p.productPrefix}(${
(getFields(p.getClass) map { f =>
f setAccessible true
s"${f.getName} = ${prettyPrint(f.get(p))}"
}) mkString ", "
})"
// General objects and primitives end up here.
case q =>
Option(q).map(_.toString).getOrElse("¡null!")
}
}
Just like parser combinators, Scala already contains pretty printer combinators in the standard library. (note: this library is deprecated as of Scala 2.11. A similar pretty printing library is a part of kiama open source project).
You are not saying it plainly in your question if you need the solution that does "reflection" or you'd like to build the printer explicitly. (though your "bonus question" hints you probably want "reflective" solution)
Anyway, in the case you'd like to develop simple pretty printer using plain Scala library, here it is. The following code is REPLable.
case class VarDecl(name: String, `type`: String)
case class ClassDecl(name: String, fields: List[VarDecl])
import scala.text._
import Document._
def varDoc(x: VarDecl) =
nest(4, text("- VarDecl") :/:
group("name = " :: text(x.name)) :/:
group("type = " :: text(x.`type`))
)
def classDoc(x: ClassDecl) = {
val docs = ((empty:Document) /: x.fields) { (d, f) => varDoc(f) :/: d }
nest(2, text("ClassDecl") :/:
group("name = " :: text(x.name)) :/:
group("fields =" :/: docs))
}
def prettyPrint(d: Document) = {
val writer = new java.io.StringWriter
d.format(1, writer)
writer.toString
}
prettyPrint(classDoc(
ClassDecl("Complex", VarDecl("Real","float") :: VarDecl("Imag","float") :: Nil)
))
Bonus question: wrap the printers into type classes for even greater composability.
The nicest, most concise "out-of-the" box experience I've found is with the Kiama pretty printing library. It doesn't print member names without using additional combinators, but with only import org.kiama.output.PrettyPrinter._; pretty(any(data)) you have a great start:
case class ClassDecl( kind : Kind, list : List[ VarDecl ] )
sealed trait Kind
case object Complex extends Kind
case class VarDecl( a : Int, b : String )
val data = ClassDecl(Complex,List(VarDecl(1, "abcd"), VarDecl(2, "efgh")))
import org.kiama.output.PrettyPrinter._
// `w` is the wrapping width. `1` forces wrapping all components.
pretty(any(data), w=1)
Produces:
ClassDecl (
Complex (),
List (
VarDecl (
1,
"abcd"),
VarDecl (
2,
"efgh")))
Note that this is just the most basic example. Kiama PrettyPrinter is an extremely powerful library with a rich set of combinators specifically designed for intelligent spacing, line wrapping, nesting, and grouping. It's very easy to tweak to suit your needs. As of this posting, it's available in SBT with:
libraryDependencies += "com.googlecode.kiama" %% "kiama" % "1.8.0"
Using reflection
import scala.reflect.ClassTag
import scala.reflect.runtime.universe._
object CaseClassBeautifier {
def getCaseAccessors[T: TypeTag] = typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
def nice[T:TypeTag](x: T)(implicit classTag: ClassTag[T]) : String = {
val instance = x.asInstanceOf[T]
val mirror = runtimeMirror(instance.getClass.getClassLoader)
val accessors = getCaseAccessors[T]
var res = List.empty[String]
accessors.foreach { z ⇒
val instanceMirror = mirror.reflect(instance)
val fieldMirror = instanceMirror.reflectField(z.asTerm)
val s = s"${z.name} = ${fieldMirror.get}"
res = s :: res
}
val beautified = x.getClass.getSimpleName + "(" + res.mkString(", ") + ")"
beautified
}
}
This is a shamless copy paste of #F. P Freely, but
I've added an indentation feature
slight modifications so that the output will be of correct Scala style (and will compile for all primative types)
Fixed string literal bug
Added support for java.sql.Timestamp (as I use this with Spark a lot)
Tada!
// Recursively get all the fields; this will grab vals declared in parents of case classes.
def getFields(cls: Class[_]): List[Field] =
Option(cls.getSuperclass).map(getFields).getOrElse(Nil) ++
cls.getDeclaredFields.toList.filterNot(f =>
f.isSynthetic || java.lang.reflect.Modifier.isStatic(f.getModifiers))
// FIXME fix bug where indent seems to increase too much
def prettyfy(a: Any, indentSize: Int = 0): String = {
val indent = List.fill(indentSize)(" ").mkString
val newIndentSize = indentSize + 2
(a match {
// Make Strings look similar to their literal form.
case string: String =>
val conversionMap = Map('\n' -> "\\n", '\r' -> "\\r", '\t' -> "\\t", '\"' -> "\\\"", '\\' -> "\\\\")
string.map(c => conversionMap.getOrElse(c, c)).mkString("\"", "", "\"")
case xs: Seq[_] =>
xs.map(prettyfy(_, newIndentSize)).toString
case xs: Array[_] =>
s"Array(${xs.map(prettyfy(_, newIndentSize)).mkString(", ")})"
case map: Map[_, _] =>
s"Map(\n" + map.map {
case (key, value) => " " + prettyfy(key, newIndentSize) + " -> " + prettyfy(value, newIndentSize)
}.mkString(",\n") + "\n)"
case None => "None"
case Some(x) => "Some(" + prettyfy(x, newIndentSize) + ")"
case timestamp: Timestamp => "new Timestamp(" + timestamp.getTime + "L)"
case p: Product =>
s"${p.productPrefix}(\n${
getFields(p.getClass)
.map { f =>
f.setAccessible(true)
s" ${f.getName} = ${prettyfy(f.get(p), newIndentSize)}"
}
.mkString(",\n")
}\n)"
// General objects and primitives end up here.
case q =>
Option(q).map(_.toString).getOrElse("null")
})
.split("\n", -1).mkString("\n" + indent)
}
E.g.
case class Foo(bar: String, bob: Int)
case class Alice(foo: Foo, opt: Option[String], opt2: Option[String])
scala> prettyPrint(Alice(Foo("hello world", 10), Some("asdf"), None))
res6: String =
Alice(
foo = Foo(
bar = "hello world",
bob = 10
),
opt = Some("asdf"),
opt2 = None
)
If you use Apache Spark, you can use the following method to print your case classes :
def prettyPrint[T <: Product : scala.reflect.runtime.universe.TypeTag](c:T) = {
import play.api.libs.json.Json
println(Json.prettyPrint(Json.parse(Seq(c).toDS().toJSON.head)))
}
This gives a nicely formatted JSON representation of your case class instance. Make sure sparkSession.implicits._ is imported
example:
case class Adress(country:String,city:String,zip:Int,street:String)
case class Person(name:String,age:Int,adress:Adress)
val person = Person("Peter",36,Adress("Switzerland","Zürich",9876,"Bahnhofstrasse 69"))
prettyPrint(person)
gives :
{
"name" : "Peter",
"age" : 36,
"adress" : {
"country" : "Switzerland",
"city" : "Zürich",
"zip" : 9876,
"street" : "Bahnhofstrasse 69"
}
}
I would suggest using the same print that is used in the AssertEquals of your test framework of choice. I was using Scalameta and munit.Assertions.munitPrint(clue: => Any): String does the trick. I can pass nested classes to it and see the whole tree with the proper indentation.

Scala: how to traverse stream/iterator collecting results into several different collections

I'm going through log file that is too big to fit into memory and collecting 2 type of expressions, what is better functional alternative to my iterative snippet below?
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)]={
val lines : Iterator[String] = io.Source.fromFile(file).getLines()
val logins: mutable.Map[String, String] = new mutable.HashMap[String, String]()
val errors: mutable.ListBuffer[(String, String)] = mutable.ListBuffer.empty
for (line <- lines){
line match {
case errorPat(date,ip)=> errors.append((ip,date))
case loginPat(date,user,ip,id) =>logins.put(ip, id)
case _ => ""
}
}
errors.toList.map(line => (logins.getOrElse(line._1,"none") + " " + line._1,line._2))
}
Here is a possible solution:
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String,String)] = {
val lines = Source.fromFile(file).getLines
val (err, log) = lines.collect {
case errorPat(inf, ip) => (Some((ip, inf)), None)
case loginPat(_, _, ip, id) => (None, Some((ip, id)))
}.toList.unzip
val ip2id = log.flatten.toMap
err.collect{ case Some((ip,inf)) => (ip2id.getOrElse(ip,"none") + "" + ip, inf) }
}
Corrections:
1) removed unnecessary types declarations
2) tuple deconstruction instead of ulgy ._1
3) left fold instead of mutable accumulators
4) used more convenient operator-like methods :+ and +
def streamData(file: File, errorPat: Regex, loginPat: Regex): List[(String, String)] = {
val lines = io.Source.fromFile(file).getLines()
val (logins, errors) =
((Map.empty[String, String], Seq.empty[(String, String)]) /: lines) {
case ((loginsAcc, errorsAcc), next) =>
next match {
case errorPat(date, ip) => (loginsAcc, errorsAcc :+ (ip -> date))
case loginPat(date, user, ip, id) => (loginsAcc + (ip -> id) , errorsAcc)
case _ => (loginsAcc, errorsAcc)
}
}
// more concise equivalent for
// errors.toList.map { case (ip, date) => (logins.getOrElse(ip, "none") + " " + ip) -> date }
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
I have a few suggestions:
Instead of a pair/tuple, it's often better to use your own class. It gives meaningful names to both the type and its fields, which makes the code much more readable.
Split the code into small parts. In particular, try to decouple pieces of code that don't need to be tied together. This makes your code easier to understand, more robust, less prone to errors and easier to test. In your case it'd be good to separate producing your input (lines of a log file) and consuming it to produce a result. For example, you'd be able to make automatic tests for your function without having to store sample data in a file.
As an example and exercise, I tried to make a solution based on Scalaz iteratees. It's a bit longer (includes some auxiliary code for IteratorEnumerator) and perhaps it's a bit overkill for the task, but perhaps someone will find it helpful.
import java.io._;
import scala.util.matching.Regex
import scalaz._
import scalaz.IterV._
object MyApp extends App {
// A type for the result. Having names keeps things
// clearer and shorter.
type LogResult = List[(String,String)]
// Represents a state of our computation. Not only it
// gives a name to the data, we can also put here
// functions that modify the state. This nicely
// separates what we're computing and how.
sealed case class State(
logins: Map[String,String],
errors: Seq[(String,String)]
) {
def this() = {
this(Map.empty[String,String], Seq.empty[(String,String)])
}
def addError(date: String, ip: String): State =
State(logins, errors :+ (ip -> date));
def addLogin(ip: String, id: String): State =
State(logins + (ip -> id), errors);
// Produce the final result from accumulated data.
def result: LogResult =
for ((ip, date) <- errors.toList)
yield (logins.getOrElse(ip, "none") + " " + ip) -> date
}
// An iteratee that consumes lines of our input. Based
// on the given regular expressions, it produces an
// iteratee that parses the input and uses State to
// compute the result.
def logIteratee(errorPat: Regex, loginPat: Regex):
IterV[String,List[(String,String)]] = {
// Consumes a signle line.
def consume(line: String, state: State): State =
line match {
case errorPat(date, ip) => state.addError(date, ip);
case loginPat(date, user, ip, id) => state.addLogin(ip, id);
case _ => state
}
// The core of the iteratee. Every time we consume a
// line, we update our state. When done, compute the
// final result.
def step(state: State)(s: Input[String]): IterV[String, LogResult] =
s(el = line => Cont(step(consume(line, state))),
empty = Cont(step(state)),
eof = Done(state.result, EOF[String]))
// Return the iterate waiting for its first input.
Cont(step(new State()));
}
// Converts an iterator into an enumerator. This
// should be more likely moved to Scalaz.
// Adapted from scalaz.ExampleIteratee
implicit val IteratorEnumerator = new Enumerator[Iterator] {
#annotation.tailrec def apply[E, A](e: Iterator[E], i: IterV[E, A]): IterV[E, A] = {
val next: Option[(Iterator[E], IterV[E, A])] =
if (e.hasNext) {
val x = e.next();
i.fold(done = (_, _) => None, cont = k => Some((e, k(El(x)))))
} else
None;
next match {
case None => i
case Some((es, is)) => apply(es, is)
}
}
}
// main ---------------------------------------------------
{
// Read a file as an iterator of lines:
// val lines: Iterator[String] =
// io.Source.fromFile("test.log").getLines();
// Create our testing iterator:
val lines: Iterator[String] = Seq(
"Error: 2012/03 1.2.3.4",
"Login: 2012/03 user 1.2.3.4 Joe",
"Error: 2012/03 1.2.3.5",
"Error: 2012/04 1.2.3.4"
).iterator;
// Create an iteratee.
val iter = logIteratee("Error: (\\S+) (\\S+)".r,
"Login: (\\S+) (\\S+) (\\S+) (\\S+)".r);
// Run the the iteratee against the input
// (the enumerator is implicit)
println(iter(lines).run);
}
}

Parentheses matching in Scala --- functional approach

Let's say I want to parse a string with various opening and closing brackets (I used parentheses in the title because I believe it is more common -- the question is the same nevertheless) so that I get all the higher levels separated in a list.
Given:
[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]
I want:
List("[hello:=[notting],[hill]]", "[3.4(4.56676|5.67787)]", "[the[hill[is[high]]not]]")
The way I am doing this is by counting the opening and closing brackets and adding to the list whenever I get my counter to 0. However, I have an ugly imperative code. You may assume that the original string is well formed.
My question is: what would be a nice functional approach to this problem?
Notes: I have thought of using the for...yield construct but given the use of the counters I cannot get a simple conditional (I must have conditionals just for updating the counters as well) and I do not know how I could use this construct in this case.
Quick solution using Scala parser combinator library:
import util.parsing.combinator.RegexParsers
object Parser extends RegexParsers {
lazy val t = "[^\\[\\]\\(\\)]+".r
def paren: Parser[String] =
("(" ~ rep1(t | paren) ~ ")" |
"[" ~ rep1(t | paren) ~ "]") ^^ {
case o ~ l ~ c => (o :: l ::: c :: Nil) mkString ""
}
def all = rep(paren)
def apply(s: String) = parseAll(all, s)
}
Checking it in REPL:
scala> Parser("[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]")
res0: Parser.ParseResult[List[String]] = [1.72] parsed: List([hello:=[notting],[hill]], [3.4(4.56676|5.67787)], [the[hill[is[high]]not]])
What about:
def split(input: String): List[String] = {
def loop(pos: Int, ends: List[Int], xs: List[String]): List[String] =
if (pos >= 0)
if ((input charAt pos) == ']') loop(pos-1, pos+1 :: ends, xs)
else if ((input charAt pos) == '[')
if (ends.size == 1) loop(pos-1, Nil, input.substring(pos, ends.head) :: xs)
else loop(pos-1, ends.tail, xs)
else loop(pos-1, ends, xs)
else xs
loop(input.length-1, Nil, Nil)
}
scala> val s1 = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
s1: String = [hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]
scala> val s2 = "[f[sad][add]dir][er][p]"
s2: String = [f[sad][add]dir][er][p]
scala> split(s1) foreach println
[hello:=[notting],[hill]]
[3.4(4.56676|5.67787)]
[the[hill[is[high]]not]]
scala> split(s2) foreach println
[f[sad][add]dir]
[er]
[p]
Given your requirements counting the parenthesis seems perfectly fine. How would you do that in a functional way? You can make the state explicitly passed around.
So first we define our state which accumulates results in blocks or concatenates the next block and keeps track of the depth:
case class Parsed(blocks: Vector[String], block: String, depth: Int)
Then we write a pure function that processed that returns the next state. Hopefully, we can just carefully look at this one function and ensure it's correct.
def nextChar(parsed: Parsed, c: Char): Parsed = {
import parsed._
c match {
case '[' | '(' => parsed.copy(block = block + c,
depth = depth + 1)
case ']' | ')' if depth == 1
=> parsed.copy(blocks = blocks :+ (block + c),
block = "",
depth = depth - 1)
case ']' | ')' => parsed.copy(block = block + c,
depth = depth - 1)
case _ => parsed.copy(block = block + c)
}
}
Then we just used a foldLeft to process the data with an initial state:
val data = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
val parsed = data.foldLeft(Parsed(Vector(), "", 0))(nextChar)
parsed.blocks foreach println
Which returns:
[hello:=[notting],[hill]]
[3.4(4.56676|5.67787)]
[the[hill[is[high]]not]]
You have an ugly imperative solution, so why not make a good-looking one? :)
This is an imperative translation of huynhjl's solution, but just posting to show that sometimes imperative is concise and perhaps easier to follow.
def parse(s: String) = {
var res = Vector[String]()
var depth = 0
var block = ""
for (c <- s) {
block += c
c match {
case '[' => depth += 1
case ']' => depth -= 1
if (depth == 0) {
res :+= block
block = ""
}
case _ =>
}
}
res
}
Try this:
val s = "[hello:=[notting],[hill]][3.4(4.56676|5.67787)][the[hill[is[high]]not]]"
s.split("]\\[").toList
returns:
List[String](
[hello:=[notting],[hill],
3.4(4.56676|5.67787),
the[hill[is[high]]not]]
)