Clever way to break a Seq[Any] into a case class - scala

I've been parsing a proprietary file format that has sections and each section has a number of records. The sections can be in any order and the records can be in any order. The order is not significant. While sections should not be duplicated, I can't guarantee that.
I've been using parboiled2 to generate the AST using a format like the following:
oneOrMore( Section1 | Section2 | Section3 )
Where every section generates a case class. They don't inherit from anything resulting in Seq[Any]
These section case classes also contain a Seq[T] of records specific to the section type.
I would like to transform the Seq[Any] into a
case class (section1:Seq[T1], section2:Seq[T2], section3:Seq[T3] )
Does someone have a clever and easy to read technique for that or should I make some mutable collections and use a foreach with a match?
I always feel like I am missing some Scala magic when I fall back to a foreach with vars.
EDIT 1:
It was brought up that I should extend a common base class, it is true that I could. But I don't see what that changes about the solution if I still have to use match to identify the type. I want to separate out the different case class types, for instance below I want to collect all the B's, C's, E's, and F's together into a Seq[B], Seq[C], Seq[E], and Seq[F]
class A()
case class B(v:Int) extends A
case class C(v:String) extends A
case class E(v:Int)
case class F(v:String)
val a:Seq[A] = B(1) :: C("2") :: Nil
val d:Seq[Any] = E(3) :: F("4") :: Nil
a.head match {
case B(v) => v should equal (1)
case _ => fail()
}
a.last match {
case C(v) => v should equal ("2")
case _ => fail()
}
d.head match {
case E(v) => v should equal (3)
case _ => fail()
}
d.last match {
case F(v) => v should equal ("4")
case _ => fail()
}
EDIT 2: Folding solution
case class E(v:Int)
case class F(v:String)
val d:Seq[Any] = E(3) :: F("4") :: Nil
val Ts = d.foldLeft((Seq[E](), Seq[F]()))(
(c,r) => r match {
case e:E => c.copy(_1=c._1 :+ e)
case e:F => c.copy(_2=c._2 :+ e)
}
)
Ts should equal ( (E(3) :: Nil, F("4") :: Nil) )
EDIT 3: Exhaustivity
sealed trait A //sealed is important
case class E(v:Int) extends A
case class F(v:String) extends A
val d:Seq[Any] = E(3) :: F("4") :: Nil
val Ts = d.foldLeft((Seq[E](), Seq[F]()))(
(c,r) => r match {
case e:E => c.copy(_1=c._1 :+ e)
case e:F => c.copy(_2=c._2 :+ e)
}
)
Ts should equal ( (E(3) :: Nil, F("4") :: Nil) )

While this could be done with shapeless to make a solution that is more terse (As Travis pointed out) I chose to go with a pure Scala solution based on Travis' feedback.
Here is an example of using foldLeft to manipulate a tuple housing strongly typed Seq[]. Unfortunately every type that is possible requires a case in the match which can become tedious if there are many types.
Also note, that if the base class is sealed, then the match will give an exhaustivity warning in the event a type was missed making this operation type safe.
sealed trait A //sealed is important
case class E(v:Int) extends A
case class F(v:String) extends A
val d:Seq[A] = E(3) :: F("4") :: Nil
val Ts = d.foldLeft((Seq[E](), Seq[F]()))(
(c,r) => r match {
case e:E => c.copy(_1=c._1 :+ e)
case e:F => c.copy(_2=c._2 :+ e)
}
)
Ts should equal ( (E(3) :: Nil, F("4") :: Nil) )

Related

Reduce/Fold only some elements

I have a file parser that produces a collection of elements all belonging to the same trait. It is similar to the following.
trait Data {
val identifier: String
}
case class Meta(identifier: String, props: Properties) extends Data
case class Complete(identifier: String, contents: Map[String, Any]) extends Data
case class Partial(identifier: String, name: String, value: Any) extends Data
...
def parse(file: File): Iterator[Data] = ... // this isn't relevant
What I am attempting to do is traverse the collection in a functional manner since I am processing a lot of data and want to be as memory conscious as possible. The collection when it is returned from the parse method is a mix of Complete, Meta, and Partial elements. The logic is that I need to pass the Complete and Meta elements through unchanged, while collecting the Partial elements and grouping on the identifier to create Complete elements.
With just a collection of Partial elements (Iterator[Partial]), I can do the following:
partialsOnly.groupBy(_.identifier)
.map{
case (ident, parts) =>
Complete(ident, parts.map(p => p.name -> p.value).toMap)
}
Is there a functional way, somewhat similar to scan that will accumulate elements, but only some elements, while letting the rest through unchanged?
You can use the partition function to split a collection in two based on a predicate.
val (partial: List[Data], completeAndMeta: List[Data]) = parse("file").partition(_ match{
case partial: Partial => true
case _ => false
})
From there, you want to make sure you can process partial as a List[Partial], ideally without tripping compiler warnings about type erasure or doing messy casts. You can do this with a call to collect, using a function that only accepts Partial's.
val partials: List[Partial] = partial.collect(_.match{case partial: Partial => partial}}
Unfortunately, when used on an Iterator, partition may need to buffer arbitrary amounts of data, so isn't necessarily the most memory efficient technique. If memory management is a huge concern, you may need to sacrifice functional purity. Alternately, if you add some way of knowing when a Partial is completed, you can accumulate them in a Map via a foldLeft and emit the final value as they finish.
Recursion might be functional way to solve your problem:
def parse(list: List[Data]): (List[Data], List[Data]) = {
list match {
case (x:Partial) :: xs =>
val (partials, rest) = parse(xs)
(x :: partials, rest) //instead of creating list, you can join partials here
case x :: xs =>
val (partials, rest) = parse(xs)
(partials, x :: rest)
case _ => (Nil, Nil)
}
}
val (partials, rest) = parse(list)
Unfortunately, this function is not tail recursive, so it might blow up the stack for longer lists.
You can solve it by using Eval from cats:
def parse2(list: List[Data]): Eval[(List[Data], List[Data])] =
Eval.now(list).flatMap {
case (x:Partial) :: xs =>
parse2(xs).map {
case (partials, rest) => (x :: partials, rest) //instead of creating list, you can join partials here
}
case x :: xs =>
parse2(xs).map {
case (partials, rest) => (partials, x :: rest)
}
case _ => Eval.now((Nil, Nil))
}
val (partialsResult, restResult) = parse2(longList).value
This solution would be safe for the stack because it uses Heap, not Stack.
And here's version, which also groups partials:
def parse3(list: List[Data]): Eval[(Map[String, List[Partial]], List[Data])] =
Eval.now(list).flatMap {
case (x:Partial) :: xs =>
parse3(xs).map {
case (partials, rest) =>
val newPartials = x :: partials.getOrElse(x.identifier, Nil)
(partials + (x.identifier -> newPartials), rest)
}
case x :: xs =>
parse3(xs).map {
case (partials, rest) => (partials, x :: rest)
}
case _ => Eval.now((Map.empty[String, List[Partial]], Nil))
}

Scala Pattern-matching on a constructor with parameters, but also reusing the object

I am using a match statement a want to match on a particular constructor, while also use the object on the right side of case. e.g., in following example I want to use the object of MyClass on the right side, but I don't know how to refer to it.
match x {
case MyClass(a1)::remainingList => ?
case ...
}
x match {
case (head # MyClass(s)) :: tail => ...
case Nil => ...
}
I think it's possible only with list.head
case class MyClass(s: String)
val seq = Seq(MyClass("a"), MyClass("b"))
seq match {
case my # MyClass(s) :: xs => println(my.head)
}
Assign the match to a value for reference later, as in
case f: Foo => foo.bar
Alternatively, if what you are matching is a case class you can unpack it into a tuple like:
case (foo, bar) => foo + bar
If you only want to match the head to a specific class you can pass the head into the match statement instead of the whole list
myList.head match {
case f: Foo => f.bar
}

More efficient Solution with tailrecursion?

I have the following ADT for Formulas. (shortened to the important ones)
sealed trait Formula
case class Variable(id: String) extends Formula
case class Negation(f: Formula) extends Formula
abstract class BinaryConnective(val f0: Formula, val f1: Formula) extends Formula
Note that the following methods are defined in an implicit class for formulas.
Let's say i want to get all variables from a formula.
My first approach was:
Solution 1
def variables: Set[Variable] = formula match {
case v: Variable => HashSet(v)
case Negation(f) => f.variables
case BinaryConnective(f0, f1) => f0.variables ++ f1.variables
case _ => HashSet.empty
}
This approach is very simple to understand, but not tailrecursive. So I wanted to try something different. I implemented a foreach on my tree-like formulas.
Solution 2
def foreach(func: Formula => Unit) = {
#tailrec
def foreach(list: List[Formula]): Unit = list match {
case Nil =>
case _ => foreach(list.foldLeft(List.empty[Formula])((next, formula) => {
func(formula)
formula match {
case Negation(f) => f :: next
case BinaryConnective(f0, f1) => f0 :: f1 :: next
case _ => next
}
}))
}
foreach(List(formula))
}
Now I can implement many methods with the help of the foreach.
def variables2 = {
val builder = Set.newBuilder[Variable]
formula.foreach {
case v: Variable => builder += v
case _ =>
}
builder.result
}
Now finally to the question. Which solution is preferable in terms of efficieny? At least I find my simple first solution more aesthetic.
I would expect Solution 2 to be more efficient, because you aren't create many different HashSet instances and combining them together. It is also more general.
You can simplify your Solution 2, removing the foldLeft:
def foreach(func: Formula => Unit) = {
#tailrec
def foreach(list: List[Formula]): Unit = list match {
case Nil =>
case formula :: next => {
func(formula)
foreach {
formula match {
case Negation(f) => f :: next
case BinaryConnective(f0, f1) => f0 :: f1 :: next
case _ => next
}
}
}
}
foreach(List(formula))
}

How to understand "pattern match with Singleton object" in scala?

The context of my question is similar to some others asked in the forum, but I cannot find an exact match and it still remains a mystery to me after viewing those answers. So I appreciate it if someone can help. The context of my question is to match a singleton class object using a pattern match.
For example, if I am implementing a list structure of my own, like this
// An implementation of list
trait AList[+T] // covariant
case class Cons[+T](val head: T, val tail: AList[T]) extends AList[T]
case object Empty extends AList[Nothing] // singleton object
// an instance of implemented list
val xs = Cons(1, Cons(2, Cons(3, Empty)))
// pattern matching in a method - IT WORKS!
def foo[T](xs: AList[T]) = xs match {
case Empty => "empty"
case Cons(x, xss) => s"[$x...]"
}
println(foo(xs)) // => [1...]
// pattern matching outside - IT RAISES ERROR:
// pattern type is incompatible with expected type;
// found : Empty.type
// required: Cons[Nothing]
val r: String = xs match {
case Empty => "EMPTY"
case Cons(x, xss) => s"[$x...]"
}
println(r) // does NOT compile
To me they look like the same "matching" on the same "objects", how come one worked and the other failed? I guess the error had something to do with the different of matching expr in and out of methods, but the message given by the compiler was quite misleading. Does it mean we need to explicitly cast xs like xs.asInstanceOf[AList[Int]] when "matching" outside?
Compiler tells you that type of xs is Cons and it can't be Empty, so your first case is pointless.
Try this:
val r: String = (xs: AList[Int]) match {
case Empty => "EMPTY"
case Cons(x, xss) => s"[$x...]"
}
Or this:
val ys: AList[Int] = xs
val r: String = ys match {
case Empty => "EMPTY"
case Cons(x, xss) => s"[$x...]"
}
In this case compiler don't knows that case Empty is pointless.
It's exactly what you are doing with def foo[T](xs: AList[T]) = .... You'd get the same compilation error with def foo[T](xs: Cons[T]) = ....
In this particular example valid and exhaustive match looks like this:
val r: String = xs match {
// case Empty => "EMPTY" // would never happened.
case Cons(x, xss) => s"[$x...]"
}
Addition: you should make your AList trait sealed:
sealed trait AList[+T]
It allows compiler to warn you on not exhaustive matches:
val r: String = (xs: AList[Int]) match {
case Cons(x, xss) => s"[$x...]"
}
<console>:25: warning: match may not be exhaustive.
It would fail on the following input: Empty
val r: String = (xs: AList[Int]) match {
^
The parameter of foo is a AList[T], so in the first case the matching is being done on a AList[T]. In the second case the matching is being done on a Cons[+T].
Basically matching is done on a object type, not on a object.

Building variations of nested case classes

So I got something like this:
abstract class Term
case class App(f:Term,x:Term) extends Term
case class Var(s:String) extends Term
case class Amb(a:Term, b:Term) extends Term //ambiguity
And a Term may look like this:
App(Var(f),Amb(Var(x),Amb(Var(y),Var(z))))
So what I need is all variations that are indicated by the Amb class.
This is used to represent a ambiguous parse forest and I want to type check each possible variation and select the right one.
In this example I would need:
App(Var(f),Var(x))
App(Var(f),Var(y))
App(Var(f),Var(z))
Whats the best way to create these variations in scala?
Efficiency would be nice, but is not really requirement.
If possible I like to refrain from using reflection.
Scala provides pattern matching solve these kinds of problems. A solution would look like:
def matcher(term: Term): List[Term] = {
term match {
case Amb(a, b) => matcher(a) ++ matcher(b)
case App(a, b) => for { va <- matcher(a); vb <- matcher(b) } yield App(va, vb)
case v: Var => List(v)
}
}
You can do this pretty cleanly with a recursive function that traverses the tree and expands ambiguities:
sealed trait Term
case class App(f: Term, x: Term) extends Term
case class Var(s: String) extends Term
case class Amb(a: Term, b: Term) extends Term
def det(term: Term): Stream[Term] = term match {
case v: Var => Stream(v)
case App(f, x) => det(f).flatMap(detf => det(x).map(App(detf, _)))
case Amb(a, b) => det(a) ++ det(b)
}
Note that I'm using a sealed trait instead of an abstract class in order to take advantage of the compiler's ability to check exhaustivity.
It works as expected:
scala> val app = App(Var("f"), Amb(Var("x"), Amb(Var("y"), Var("z"))))
app: App = App(Var(f),Amb(Var(x),Amb(Var(y),Var(z))))
scala> det(app) foreach println
App(Var(f),Var(x))
App(Var(f),Var(y))
App(Var(f),Var(z))
If you can change the Term API, you could more or less equivalently add a def det: Stream[Term] method there.
Since my abstract syntax is fairly large (and I have multiple) and I tried my luck with Kiama.
So here is the version Travis Brown and Mark posted with Kiama.
Its not pretty, but I hope it works. Comments are welcome.
def disambiguateRule: Strategy = rule {
case Amb(a: Term, b: Term) =>
rewrite(disambiguateRule)(a).asInstanceOf[List[_]] ++
rewrite(disambiguateRule)(b).asInstanceOf[List[_]]
case x =>
val ch = getChildren(x)
if(ch.isEmpty) {
List(x)
}
else {
val chdis = ch.map({ rewrite(disambiguateRule)(_) }) // get all disambiguate children
//create all combinations of the disambiguated children
val p = combinations(chdis.asInstanceOf[List[List[AnyRef]]])
//use dup from Kiama to recreate the term with every combination
val xs = for { newchildren <- p } yield dup(x.asInstanceOf[Product], newchildren.toArray)
xs
}
}
def combinations(ll: List[List[AnyRef]]): List[List[AnyRef]] = ll match {
case Nil => Nil
case x :: Nil => x.map { List(_) }
case x :: xs => combinations(xs).flatMap({ ys => x.map({ xx => xx :: ys }) })
}
def getChildren(x: Any): List[Any] = {
val l = new ListBuffer[Any]()
all(queryf {
case a => l += a
})(x)
l.toList
}