Nested Scala case classes to/from CSV - scala

There are many nice libraries for writing/reading Scala case classes to/from CSV files. I'm looking for something that goes beyond that, which can handle nested cases classes. For example, here a Match has two Players:
case class Player(name: String, ranking: Int)
case class Match(place: String, winner: Player, loser: Player)
val matches = List(
Match("London", Player("Jane",7), Player("Fred",23)),
Match("Rome", Player("Marco",19), Player("Giulia",3)),
Match("Paris", Player("Isabelle",2), Player("Julien",5))
)
I'd like to effortlessly (no boilerplate!) write/read matches to/from this CSV:
place,winner.name,winner.ranking,loser.name,loser.ranking
London,Jane,7,Fred,23
Rome,Marco,19,Giulia,3
Paris,Isabelle,2,Julien,5
Note the automated header line using the dot "." to form the column name for a nested field, e.g. winner.ranking. I'd be delighted if someone could demonstrate a simple way to do this (say, using reflection or Shapeless).
[Motivation. During data analysis it's convenient to have a flat CSV to play around with, for sorting, filtering, etc., even when case classes are nested. And it would be nice if you could load nested case classes back from such files.]

Since a case-class is a Product, getting the values of the various fields is relatively easy. Getting the names of the fields/columns does require using Java reflection.
The following function takes a list of case-class instances and returns a list of rows, each is a list of strings. It is using a recursion to get the values and headers of child case-class instances.
def toCsv(p: List[Product]): List[List[String]] = {
def header(c: Class[_], prefix: String = ""): List[String] = {
c.getDeclaredFields.toList.flatMap { field =>
val name = prefix + field.getName
if (classOf[Product].isAssignableFrom(field.getType)) header(field.getType, name + ".")
else List(name)
}
}
def flatten(p: Product): List[String] =
p.productIterator.flatMap {
case p: Product => flatten(p)
case v: Any => List(v.toString)
}.toList
header(classOf[Match]) :: p.map(flatten)
}
However, constructing case-classes from CSV is far more involved, requiring to use reflection for getting the types of the various fields, for creating the values from the CSV strings and for constructing the case-class instances.
For simplicity (not saying the code is simple, just so it won't be further complicated), I assume that the order of columns in the CSV is the same as if the file was produced by the toCsv(...) function above.
The following function starts by creating a list of "instructions how to process a single CSV row" (the instructions are also used to verify that the column headers in the CSV matches the the case-class properties). The instructions are then used to recursively produce one CSV row at a time.
def fromCsv[T <: Product](csv: List[List[String]])(implicit tag: ClassTag[T]): List[T] = {
trait Instruction {
val name: String
val header = true
}
case class BeginCaseClassField(name: String, clazz: Class[_]) extends Instruction {
override val header = false
}
case class EndCaseClassField(name: String) extends Instruction {
override val header = false
}
case class IntField(name: String) extends Instruction
case class StringField(name: String) extends Instruction
case class DoubleField(name: String) extends Instruction
def scan(c: Class[_], prefix: String = ""): List[Instruction] = {
c.getDeclaredFields.toList.flatMap { field =>
val name = prefix + field.getName
val fType = field.getType
if (fType == classOf[Int]) List(IntField(name))
else if (fType == classOf[Double]) List(DoubleField(name))
else if (fType == classOf[String]) List(StringField(name))
else if (classOf[Product].isAssignableFrom(fType)) BeginCaseClassField(name, fType) :: scan(fType, name + ".")
else throw new IllegalArgumentException(s"Unsupported field type: $fType")
} :+ EndCaseClassField(prefix)
}
def produce(instructions: List[Instruction], row: List[String], argAccumulator: List[Any]): (List[Instruction], List[String], List[Any]) = instructions match {
case IntField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString.toInt)
case StringField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString)
case DoubleField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString.toDouble)
case BeginCaseClassField(_, clazz) :: tail =>
val (instructionRemaining, rowRemaining, constructorArgs) = produce(tail, row, List.empty)
val newCaseClass = clazz.getConstructors.head.newInstance(constructorArgs.map(_.asInstanceOf[AnyRef]): _*)
produce(instructionRemaining, rowRemaining, argAccumulator :+ newCaseClass)
case EndCaseClassField(_) :: tail => (tail, row, argAccumulator)
case Nil if row.isEmpty => (Nil, Nil, argAccumulator)
case Nil => throw new IllegalArgumentException("Not all values from CSV row were used")
}
val instructions = BeginCaseClassField(".", tag.runtimeClass) :: scan(tag.runtimeClass)
assert(csv.head == instructions.filter(_.header).map(_.name), "CSV header doesn't match target case-class fields")
csv.drop(1).map(row => produce(instructions, row, List.empty)._3.head.asInstanceOf[T])
}
I've tested this using:
case class Player(name: String, ranking: Int, price: Double)
case class Match(place: String, winner: Player, loser: Player)
val matches = List(
Match("London", Player("Jane", 7, 12.5), Player("Fred", 23, 11.1)),
Match("Rome", Player("Marco", 19, 13.54), Player("Giulia", 3, 41.8)),
Match("Paris", Player("Isabelle", 2, 31.7), Player("Julien", 5, 16.8))
)
val csv = toCsv(matches)
val matchesFromCsv = fromCsv[Match](csv)
assert(matches == matchesFromCsv)
Obviously this should be optimized and hardened if you ever want to use this for production...

Related

Filter list elements based on another list elements

I have 2 Lists: lista and listb. For each element in lista, I want to check if a_type of each element is in b_type of listb. If true, get the b_name for corresponding b_type and construct an object objc. And, then I should return the list of of constructed objc.
Is there a way to do this in Scala and preferably without any mutable collections?
case class obja = (a_id: String, a_type: String)
case class objb = (b_id: String, b_type: String, b_name: String)
case class objc = (c_id: String, c_type: String, c_name: String)
val lista: List[obja] = List(...)
val listb: List[objb] = List(...)
def getNames(alist: List[obja], blist: List[objb]): List[objc] = ???
Lookup in lists requires traversal in O(n) time, this is inefficient. Therefore, the first thing you do is to create a map from b_type to b_name:
val bTypeToBname = listb.map(b => (b.b_type, b_name)).toMap
Then you iterate through lista, look up in the map whether there is a corresponding b_name for a given a.a_type, and construct the objc:
val cs = for {
a <- lista
b_name <- bTypeToBname.get(a.a_type)
} yield objc(a.a_id, a.a_type, b_name)
Notice how Scala for-comprehensions automatically filter those cases for which bTypeToBname(a.a_type) isn't defined: then the corresponding a is simply skipped. This because we use bTypeToBname.get(a.a_type) (which returns an Option), as opposed to calling bTypeToBname(a.a_type) directly (this would lead to a NoSuchElementException). As far as I understand, this filtering is exactly the behavior you wanted.
case class A(aId: String, aType: String)
case class B(bId: String, bType: String, bName: String)
case class C(cId: String, cType: String, cName: String)
def getNames(aList: List[A], bList: List[B]): List[C] = {
val bMap: Map[String, B] = bList.map(b => b.bType -> b)(collection.breakOut)
aList.flatMap(a => bMap.get(a.aType).map(b => C(a.aId, a.aType, b.bName)))
}
Same as Andrey's answer but without comprehension so you can see what's happening inside.
// make listb into a map from type to name for efficiency
val bs = listb.map(b => b.b_type -> b_name).toMap
val listc: Seq[objc] = lista
.flatMap(a => // flatmap to exclude types not in listb
bs.get(a.a_type) // get an option from blist
.map(bName => objc(a.a_id, a.a_type, bName)) // if there is a b name for that type, make an objc
)

Working scala code using a var in a pure function. Is this possible without a var?

Is it possible (or even worthwhile) to try to write the below code block without a var? It works with a var. This is not for an interview, it's my first attempt at scala (came from java).
The problem: Fit people as close to the front of a theatre as possible, while keeping each request (eg. Jones, 4 tickets) in a single theatre section. The theatre sections, starting at the front, are sized 6, 6, 3, 5, 5... and so on. I'm trying to accomplish this by putting together all of the potential groups of ticket requests, and then choosing the best fitting group per section.
Here are the classes. A SeatingCombination is one possible combination of SeatingRequest (just the IDs) and the sum of their ticketCount(s):
class SeatingCombination(val idList: List[Int], val seatCount: Int){}
class SeatingRequest(val id: Int, val partyName: String, val ticketCount: Int){}
class TheatreSection(val sectionSize: Int, rowNumber: Int, sectionNumber: Int) {
def id: String = rowNumber.toString + "_"+ sectionNumber.toString;
}
By the time we get to the below function...
1.) all of the possible combinations of SeatingRequest are in a list of SeatingCombination and ordered by descending size.
2.) all of the TheatreSection are listed in order.
def getSeatingMap(groups: List[SeatingCombination], sections: List[TheatreSection]): HashMap[Int, TheatreSection] = {
var seatedMap = new HashMap[Int, TheatreSection]
for (sect <- sections) {
val bestFitOpt = groups.find(g => { g.seatCount <= sect.sectionSize && !isAnyListIdInMap(seatedMap, g.idList) })
bestFitOpt.filter(_.idList.size > 0).foreach(_.idList.foreach(seatedMap.update(_, sect)))
}
seatedMap
}
def isAnyListIdInMap(map: HashMap[Int, TheatreSection], list: List[Int]): Boolean = {
(for (id <- list) yield !map.get(id).isEmpty).reduce(_ || _)
}
I wrote the rest of the program without a var, but in this iterative section it seems impossible. Maybe with my implementation strategy it's impossible. From what else I've read, a var in a pure function is still functional. But it's been bothering me I can't think of how to remove the var, because my textbook told me to try to avoid them, and I don't know what I don't know.
You can use foldLeft to iterate on sections with a running state (and again, inside, on your state to add iteratively all the ids in a section):
sections.foldLeft(Map.empty[Int, TheatreSection]){
case (seatedMap, sect) =>
val bestFitOpt = groups.find(g => g.seatCount <= sect.sectionSize && !isAnyListIdInMap(seatedMap, g.idList))
bestFitOpt.
filter(_.idList.size > 0).toList. //convert option to list
flatMap(_.idList). // flatten list from option and idList
foldLeft(seatedMap)(_ + (_ -> sect))) // add all ids to the map with sect as value
}
By the way, you can simplify the second method using exists and map.contains:
def isAnyListIdInMap(map: HashMap[Int, TheatreSection], list: List[Int]): Boolean = {
list.exists(id => map.contains(id))
}
list.exists(predicate: Int => Boolean) is a Boolean which is true if the predicate is true for any element in list.
map.contains(key) checks if map is defined at key.
If you want to be even more concise, you don't need to give a name to the argument of the predicate:
list.exists(map.contains)
Simply changing var to val should do it :)
I think, you may be asking about getting rid of the mutable map, not of the var (it doesn't need to be var in your code).
Things like this are usually written recursively in scala or using foldLeft, like other answers suggest. Here is a recursive version:
#tailrec
def getSeatingMap(
groups: List[SeatingCombination],
sections: List[TheatreSection],
result: Map[Int, TheatreSection] = Map.empty): Map[Int, TheatreSection] = sections match {
case Nil => result
case head :: tail =>
val seated = groups
.iterator
.filter(_.idList.nonEmpty)
.filterNot(_.idList.find(result.contains).isDefined)
.find(_.seatCount <= head.sectionSize)
.fold(Nil)(_.idList.map(id => id -> sect))
getSeatingMap(groups, tail, result ++ seated)
}
btw, I don't think you need to test every id in list for presence in the map - should suffice to just look at the first one. You could also make it a bit more efficient, probably, if instead of checking the map every time to see if the group is already seated, you'd just drop it from the input list as soon as the section is assigned.
#tailrec
def selectGroup(
sect: TheatreSection,
groups: List[SeatingCombination],
result: List[SeatingCombination] = Nil
): (List[(Int, TheatreSection)], List[SeatingCombination]) = groups match {
case Nil => (Nil, result)
case head :: tail
if(head.idList.nonEmpty && head.seatCount <= sect.sectionSize) => (head.idList.map(_ -> sect), result.reverse ++ tail)
case head :: tail => selectGroup(sect, tail, head :: result)
}
and then in getSeatingMap:
...
case head :: tail =>
val(seated, remaining) => selectGroup(sect, groups)
getSeatingMap(remaining, tail, result ++ seated)
Here is how I was able to achieve without using the mutable.HashMap, the suggestion by the comment to use foldLeft was used to do it:
class SeatingCombination(val idList: List[Int], val seatCount: Int){}
class SeatingRequest(val id: Int, val partyName: String, val ticketCount: Int){}
class TheatreSection(val sectionSize: Int, rowNumber: Int, sectionNumber: Int) {
def id: String = rowNumber.toString + "_"+ sectionNumber.toString;
}
def getSeatingMap(groups: List[SeatingCombination], sections: List[TheatreSection]): Map[Int, TheatreSection] = {
sections.foldLeft(Map.empty[Int, TheatreSection]) { (m, sect) =>
val bestFitOpt = groups.find(g => {
g.seatCount <= sect.sectionSize && !isAnyListIdInMap(m, g.idList)
}).filter(_.idList.nonEmpty)
val newEntries = bestFitOpt.map(_.idList.map(_ -> sect)).getOrElse(List.empty)
m ++ newEntries
}
}
def isAnyListIdInMap(map: Map[Int, TheatreSection], list: List[Int]): Boolean = {
(for (id <- list) yield map.get(id).isDefined).reduce(_ || _)
}

Filtering inside `for` with pattern matching

I am reading a TSV file and using using something like this:
case class Entry(entryType: Int, value: Int)
def filterEntries(): Iterator[Entry] = {
for {
line <- scala.io.Source.fromFile("filename").getLines()
} yield new Entry(line.split("\t").map(x => x.toInt))
}
Now I am both interested in filtering out entries whose entryType are set to 0 and ignoring lines with column count greater or lesser than 2 (that does not match the constructor). I was wondering if there's an idiomatic way to achieve this may be using pattern matching and unapply method in a companion object. The only thing I can think of is using .filter on the resulting iterator.
I will also accept solution not involving for loop but that returns Iterator[Entry]. They solutions must be tolerant to malformed inputs.
This is more state-of-arty:
package object liner {
implicit class R(val sc: StringContext) {
object r {
def unapplySeq(s: String): Option[Seq[String]] = sc.parts.mkString.r unapplySeq s
}
}
}
package liner {
case class Entry(entryType: Int, value: Int)
object I {
def unapply(s: String): Option[Int] = util.Try(s.toInt).toOption
}
object Test extends App {
def lines = List("1 2", "3", "", " 4 5 ", "junk", "0, 100000", "6 7 8")
def entries = lines flatMap {
case r"""\s*${I(i)}(\d+)\s+${I(j)}(\d+)\s*""" if i != 0 => Some(Entry(i, j))
case __________________________________________________ => None
}
Console println entries
}
}
Hopefully, the regex interpolator will make it into the standard distro soon, but this shows how easy it is to rig up. Also hopefully, a scanf-style interpolator will allow easy extraction with case f"$i%d".
I just started using the "elongated wildcard" in patterns to align the arrows.
There is a pupal or maybe larval regex macro:
https://github.com/som-snytt/regextractor
You can create variables in the head of the for-comprehension and then use a guard:
edit: ensure length of array
for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2 && arr(0) != 0
} yield new Entry(arr(0), arr(1))
I have solved it using the following code:
import scala.util.{Try, Success}
val lines = List(
"1\t2",
"1\t",
"2",
"hello",
"1\t3"
)
case class Entry(val entryType: Int, val value: Int)
object Entry {
def unapply(line: String) = {
line.split("\t").map(x => Try(x.toInt)) match {
case Array(Success(entryType: Int), Success(value: Int)) => Some(Entry(entryType, value))
case _ =>
println("Malformed line: " + line)
None
}
}
}
for {
line <- lines
entryOption = Entry.unapply(line)
if entryOption.isDefined
} yield entryOption.get
The left hand side of a <- or = in a for-loop may be a fully-fledged pattern. So you may write this:
def filterEntries(): Iterator[Int] = for {
line <- scala.io.Source.fromFile("filename").getLines()
arr = line.split("\t").map(x => x.toInt)
if arr.size == 2
// now you may use pattern matching to extract the array
Array(entryType, value) = arr
if entryType == 0
} yield Entry(entryType, value)
Note that this solution will throw a NumberFormatException if a field is not convertible to an Int. If you do not want that, you'll have to encapsulate x.toInt with a Try and pattern match again.

Allocation of Function Literals in Scala

I have a class that represents sales orders:
class SalesOrder(val f01:String, val f02:Int, ..., f50:Date)
The fXX fields are of various types. I am faced with the problem of creating an audit trail of my orders. Given two instances of the class, I have to determine which fields have changed. I have come up with the following:
class SalesOrder(val f01:String, val f02:Int, ..., val f50:Date){
def auditDifferences(that:SalesOrder): List[String] = {
def diff[A](fieldName:String, getField: SalesOrder => A) =
if(getField(this) != getField(that)) Some(fieldName) else None
val diffList = diff("f01", _.f01) :: diff("f02", _.f02) :: ...
:: diff("f50", _.f50) :: Nil
diffList.flatten
}
}
I was wondering what the compiler does with all the _.fXX functions: are they instanced just once (statically), and can be shared by all instances of my class, or will they be instanced every time I create an instance of my class?
My worry is that, since I will use a lot of SalesOrder instances, it may create a lot of garbage. Should I use a different approach?
One clean way of solving this problem would be to use the standard library's Ordering type class. For example:
class SalesOrder(val f01: String, val f02: Int, val f03: Char) {
def diff(that: SalesOrder) = SalesOrder.fieldOrderings.collect {
case (name, ord) if !ord.equiv(this, that) => name
}
}
object SalesOrder {
val fieldOrderings: List[(String, Ordering[SalesOrder])] = List(
"f01" -> Ordering.by(_.f01),
"f02" -> Ordering.by(_.f02),
"f03" -> Ordering.by(_.f03)
)
}
And then:
scala> val orderA = new SalesOrder("a", 1, 'a')
orderA: SalesOrder = SalesOrder#5827384f
scala> val orderB = new SalesOrder("b", 1, 'b')
orderB: SalesOrder = SalesOrder#3bf2e1c7
scala> orderA diff orderB
res0: List[String] = List(f01, f03)
You almost certainly don't need to worry about the perfomance of your original formulation, but this version is (arguably) nicer for unrelated reasons.
Yes, that creates 50 short lived functions. I don't think you should be worried unless you have manifest evidence that that causes a performance problem in your case.
But I would define a method that transforms SalesOrder into a Map[String, Any], then you would just have
trait SalesOrder {
def fields: Map[String, Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] = {
val af = a.fields
val bf = b.fields
af.collect { case (key, value) if bf(key) != value => key }
}
If the field names are indeed just incremental numbers, you could simplify
trait SalesOrder {
def fields: Iterable[Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] =
(a.fields zip b.fields).zipWithIndex.collect {
case ((av, bv), idx) if av != bv => f"f${idx + 1}%02d"
}

pattern matching on a series of values in scala

I'm a Scala beginner and this piece of code makes me struggle.
Is there a way to do pattern matching to make sure everything i pass to Data is of the correct type? As you can see i have quite strange datatypes...
class Data (
val recipient: String,
val templateText: String,
val templateHtml: String,
val blockMaps: Map[String,List[Map[String,String]]],
templateMap: Map[String,String]
)
...
val dataParsed = JSON.parseFull(message)
dataParsed match {
case dataParsed: Map[String, Any] => {
def e(s: String) = dataParsed get s
val templateText = e("template-text")
val templateHtml = e("template-html")
val recipient = e("email")
val templateMap = e("data")
val blockMaps = e("blkdata")
val dependencies = new Data(recipient, templateText, templateHtml, blockMaps, templateMap)
Core.inject ! dependencies
}
...
I guess your problem is you want to be able to patten match the map that you get from parseFull(), but Map doesn't have an unapply.
So you could pattern match every single value, providing a default if it is not of the correct type:
val templateText: Option[String] = e("template-text") match {
case s: String => Some(s)
case _ => None
}
Or temporarily put all the data into some structure that can be pattern matched:
val data = (e("template-text"), e("template-html"), e("email"), e("data"),
e("blkdata"))
val dependencies: Option[Data] = data match {
case (templateText: String,
templateHtml: String,
blockMaps: Map[String,List[Map[String,String]]],
templateMap: Map[String,String]) =>
Some(new Data(recipient, templateText, templateHtml, blockMaps, templateMap))
case _ => None
}