Filter list elements based on another list elements - scala

I have 2 Lists: lista and listb. For each element in lista, I want to check if a_type of each element is in b_type of listb. If true, get the b_name for corresponding b_type and construct an object objc. And, then I should return the list of of constructed objc.
Is there a way to do this in Scala and preferably without any mutable collections?
case class obja = (a_id: String, a_type: String)
case class objb = (b_id: String, b_type: String, b_name: String)
case class objc = (c_id: String, c_type: String, c_name: String)
val lista: List[obja] = List(...)
val listb: List[objb] = List(...)
def getNames(alist: List[obja], blist: List[objb]): List[objc] = ???

Lookup in lists requires traversal in O(n) time, this is inefficient. Therefore, the first thing you do is to create a map from b_type to b_name:
val bTypeToBname = listb.map(b => (b.b_type, b_name)).toMap
Then you iterate through lista, look up in the map whether there is a corresponding b_name for a given a.a_type, and construct the objc:
val cs = for {
a <- lista
b_name <- bTypeToBname.get(a.a_type)
} yield objc(a.a_id, a.a_type, b_name)
Notice how Scala for-comprehensions automatically filter those cases for which bTypeToBname(a.a_type) isn't defined: then the corresponding a is simply skipped. This because we use bTypeToBname.get(a.a_type) (which returns an Option), as opposed to calling bTypeToBname(a.a_type) directly (this would lead to a NoSuchElementException). As far as I understand, this filtering is exactly the behavior you wanted.

case class A(aId: String, aType: String)
case class B(bId: String, bType: String, bName: String)
case class C(cId: String, cType: String, cName: String)
def getNames(aList: List[A], bList: List[B]): List[C] = {
val bMap: Map[String, B] = bList.map(b => b.bType -> b)(collection.breakOut)
aList.flatMap(a => bMap.get(a.aType).map(b => C(a.aId, a.aType, b.bName)))
}

Same as Andrey's answer but without comprehension so you can see what's happening inside.
// make listb into a map from type to name for efficiency
val bs = listb.map(b => b.b_type -> b_name).toMap
val listc: Seq[objc] = lista
.flatMap(a => // flatmap to exclude types not in listb
bs.get(a.a_type) // get an option from blist
.map(bName => objc(a.a_id, a.a_type, bName)) // if there is a b name for that type, make an objc
)

Related

Map doesn't add entry in recursive function

I'm working with scala and want to make a class with a function which add recursively something to a map.
class Index(val map: Map[String, String]) {
def add(e: (String, String)): Index = {
Index(map + (e._1 -> e._2))
}
def addAll(list: List[(String, String)], index: Index = Index()): Index = {
list match {
case ::(head, next) => addAll(next, add(head))
case Nil => index
}
}
}
object Index {
def apply(map: Map[String, String] = Map()) = {
new Index(map)
}
}
val index = Index()
val list = List(
("e1", "f1"),
("e2", "f2"),
("e3", "f3"),
)
val newIndex = index.addAll(list)
println(newIndex.map.size.toString())
I excepted this code to print 3, since the function is supposed to add 3 entries to the map, but the actual output is 1. What I'm doing wrong and how to solve it?
Online fiddle: https://scalafiddle.io/sf/eqSxPX9/0
There is a simple error where you are calling add(head) where it should be index.add(head).
However it is better to use a nested method when writing recursive routines like this, for example:
def addAll(list: List[(String, String)]): Index = {
#annotation.tailrec
def loop(rem: List[(String, String)], index: Index): Index = {
rem match {
case head :: tail => loop(tail, index.add(head))
case Nil => index
}
}
loop(list, Index())
}
This allows the function to be tail recursive and optimised by the compiler, and also avoids a spurious argument to the addAll method.
I see many problems with your code but to answer your question:
Each time you call addAll you create an Index with an empty Map.
In the line case ::(head, next) => addAll(next, add(head)) you are not using the index that you get from the parameter list. Shouldn't that be somehow updated?
Beware that the default map implementation is immutable and to updating a map means that you need to create a new one with the new value added.

Nested Scala case classes to/from CSV

There are many nice libraries for writing/reading Scala case classes to/from CSV files. I'm looking for something that goes beyond that, which can handle nested cases classes. For example, here a Match has two Players:
case class Player(name: String, ranking: Int)
case class Match(place: String, winner: Player, loser: Player)
val matches = List(
Match("London", Player("Jane",7), Player("Fred",23)),
Match("Rome", Player("Marco",19), Player("Giulia",3)),
Match("Paris", Player("Isabelle",2), Player("Julien",5))
)
I'd like to effortlessly (no boilerplate!) write/read matches to/from this CSV:
place,winner.name,winner.ranking,loser.name,loser.ranking
London,Jane,7,Fred,23
Rome,Marco,19,Giulia,3
Paris,Isabelle,2,Julien,5
Note the automated header line using the dot "." to form the column name for a nested field, e.g. winner.ranking. I'd be delighted if someone could demonstrate a simple way to do this (say, using reflection or Shapeless).
[Motivation. During data analysis it's convenient to have a flat CSV to play around with, for sorting, filtering, etc., even when case classes are nested. And it would be nice if you could load nested case classes back from such files.]
Since a case-class is a Product, getting the values of the various fields is relatively easy. Getting the names of the fields/columns does require using Java reflection.
The following function takes a list of case-class instances and returns a list of rows, each is a list of strings. It is using a recursion to get the values and headers of child case-class instances.
def toCsv(p: List[Product]): List[List[String]] = {
def header(c: Class[_], prefix: String = ""): List[String] = {
c.getDeclaredFields.toList.flatMap { field =>
val name = prefix + field.getName
if (classOf[Product].isAssignableFrom(field.getType)) header(field.getType, name + ".")
else List(name)
}
}
def flatten(p: Product): List[String] =
p.productIterator.flatMap {
case p: Product => flatten(p)
case v: Any => List(v.toString)
}.toList
header(classOf[Match]) :: p.map(flatten)
}
However, constructing case-classes from CSV is far more involved, requiring to use reflection for getting the types of the various fields, for creating the values from the CSV strings and for constructing the case-class instances.
For simplicity (not saying the code is simple, just so it won't be further complicated), I assume that the order of columns in the CSV is the same as if the file was produced by the toCsv(...) function above.
The following function starts by creating a list of "instructions how to process a single CSV row" (the instructions are also used to verify that the column headers in the CSV matches the the case-class properties). The instructions are then used to recursively produce one CSV row at a time.
def fromCsv[T <: Product](csv: List[List[String]])(implicit tag: ClassTag[T]): List[T] = {
trait Instruction {
val name: String
val header = true
}
case class BeginCaseClassField(name: String, clazz: Class[_]) extends Instruction {
override val header = false
}
case class EndCaseClassField(name: String) extends Instruction {
override val header = false
}
case class IntField(name: String) extends Instruction
case class StringField(name: String) extends Instruction
case class DoubleField(name: String) extends Instruction
def scan(c: Class[_], prefix: String = ""): List[Instruction] = {
c.getDeclaredFields.toList.flatMap { field =>
val name = prefix + field.getName
val fType = field.getType
if (fType == classOf[Int]) List(IntField(name))
else if (fType == classOf[Double]) List(DoubleField(name))
else if (fType == classOf[String]) List(StringField(name))
else if (classOf[Product].isAssignableFrom(fType)) BeginCaseClassField(name, fType) :: scan(fType, name + ".")
else throw new IllegalArgumentException(s"Unsupported field type: $fType")
} :+ EndCaseClassField(prefix)
}
def produce(instructions: List[Instruction], row: List[String], argAccumulator: List[Any]): (List[Instruction], List[String], List[Any]) = instructions match {
case IntField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString.toInt)
case StringField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString)
case DoubleField(_) :: tail => produce(tail, row.drop(1), argAccumulator :+ row.head.toString.toDouble)
case BeginCaseClassField(_, clazz) :: tail =>
val (instructionRemaining, rowRemaining, constructorArgs) = produce(tail, row, List.empty)
val newCaseClass = clazz.getConstructors.head.newInstance(constructorArgs.map(_.asInstanceOf[AnyRef]): _*)
produce(instructionRemaining, rowRemaining, argAccumulator :+ newCaseClass)
case EndCaseClassField(_) :: tail => (tail, row, argAccumulator)
case Nil if row.isEmpty => (Nil, Nil, argAccumulator)
case Nil => throw new IllegalArgumentException("Not all values from CSV row were used")
}
val instructions = BeginCaseClassField(".", tag.runtimeClass) :: scan(tag.runtimeClass)
assert(csv.head == instructions.filter(_.header).map(_.name), "CSV header doesn't match target case-class fields")
csv.drop(1).map(row => produce(instructions, row, List.empty)._3.head.asInstanceOf[T])
}
I've tested this using:
case class Player(name: String, ranking: Int, price: Double)
case class Match(place: String, winner: Player, loser: Player)
val matches = List(
Match("London", Player("Jane", 7, 12.5), Player("Fred", 23, 11.1)),
Match("Rome", Player("Marco", 19, 13.54), Player("Giulia", 3, 41.8)),
Match("Paris", Player("Isabelle", 2, 31.7), Player("Julien", 5, 16.8))
)
val csv = toCsv(matches)
val matchesFromCsv = fromCsv[Match](csv)
assert(matches == matchesFromCsv)
Obviously this should be optimized and hardened if you ever want to use this for production...

Working scala code using a var in a pure function. Is this possible without a var?

Is it possible (or even worthwhile) to try to write the below code block without a var? It works with a var. This is not for an interview, it's my first attempt at scala (came from java).
The problem: Fit people as close to the front of a theatre as possible, while keeping each request (eg. Jones, 4 tickets) in a single theatre section. The theatre sections, starting at the front, are sized 6, 6, 3, 5, 5... and so on. I'm trying to accomplish this by putting together all of the potential groups of ticket requests, and then choosing the best fitting group per section.
Here are the classes. A SeatingCombination is one possible combination of SeatingRequest (just the IDs) and the sum of their ticketCount(s):
class SeatingCombination(val idList: List[Int], val seatCount: Int){}
class SeatingRequest(val id: Int, val partyName: String, val ticketCount: Int){}
class TheatreSection(val sectionSize: Int, rowNumber: Int, sectionNumber: Int) {
def id: String = rowNumber.toString + "_"+ sectionNumber.toString;
}
By the time we get to the below function...
1.) all of the possible combinations of SeatingRequest are in a list of SeatingCombination and ordered by descending size.
2.) all of the TheatreSection are listed in order.
def getSeatingMap(groups: List[SeatingCombination], sections: List[TheatreSection]): HashMap[Int, TheatreSection] = {
var seatedMap = new HashMap[Int, TheatreSection]
for (sect <- sections) {
val bestFitOpt = groups.find(g => { g.seatCount <= sect.sectionSize && !isAnyListIdInMap(seatedMap, g.idList) })
bestFitOpt.filter(_.idList.size > 0).foreach(_.idList.foreach(seatedMap.update(_, sect)))
}
seatedMap
}
def isAnyListIdInMap(map: HashMap[Int, TheatreSection], list: List[Int]): Boolean = {
(for (id <- list) yield !map.get(id).isEmpty).reduce(_ || _)
}
I wrote the rest of the program without a var, but in this iterative section it seems impossible. Maybe with my implementation strategy it's impossible. From what else I've read, a var in a pure function is still functional. But it's been bothering me I can't think of how to remove the var, because my textbook told me to try to avoid them, and I don't know what I don't know.
You can use foldLeft to iterate on sections with a running state (and again, inside, on your state to add iteratively all the ids in a section):
sections.foldLeft(Map.empty[Int, TheatreSection]){
case (seatedMap, sect) =>
val bestFitOpt = groups.find(g => g.seatCount <= sect.sectionSize && !isAnyListIdInMap(seatedMap, g.idList))
bestFitOpt.
filter(_.idList.size > 0).toList. //convert option to list
flatMap(_.idList). // flatten list from option and idList
foldLeft(seatedMap)(_ + (_ -> sect))) // add all ids to the map with sect as value
}
By the way, you can simplify the second method using exists and map.contains:
def isAnyListIdInMap(map: HashMap[Int, TheatreSection], list: List[Int]): Boolean = {
list.exists(id => map.contains(id))
}
list.exists(predicate: Int => Boolean) is a Boolean which is true if the predicate is true for any element in list.
map.contains(key) checks if map is defined at key.
If you want to be even more concise, you don't need to give a name to the argument of the predicate:
list.exists(map.contains)
Simply changing var to val should do it :)
I think, you may be asking about getting rid of the mutable map, not of the var (it doesn't need to be var in your code).
Things like this are usually written recursively in scala or using foldLeft, like other answers suggest. Here is a recursive version:
#tailrec
def getSeatingMap(
groups: List[SeatingCombination],
sections: List[TheatreSection],
result: Map[Int, TheatreSection] = Map.empty): Map[Int, TheatreSection] = sections match {
case Nil => result
case head :: tail =>
val seated = groups
.iterator
.filter(_.idList.nonEmpty)
.filterNot(_.idList.find(result.contains).isDefined)
.find(_.seatCount <= head.sectionSize)
.fold(Nil)(_.idList.map(id => id -> sect))
getSeatingMap(groups, tail, result ++ seated)
}
btw, I don't think you need to test every id in list for presence in the map - should suffice to just look at the first one. You could also make it a bit more efficient, probably, if instead of checking the map every time to see if the group is already seated, you'd just drop it from the input list as soon as the section is assigned.
#tailrec
def selectGroup(
sect: TheatreSection,
groups: List[SeatingCombination],
result: List[SeatingCombination] = Nil
): (List[(Int, TheatreSection)], List[SeatingCombination]) = groups match {
case Nil => (Nil, result)
case head :: tail
if(head.idList.nonEmpty && head.seatCount <= sect.sectionSize) => (head.idList.map(_ -> sect), result.reverse ++ tail)
case head :: tail => selectGroup(sect, tail, head :: result)
}
and then in getSeatingMap:
...
case head :: tail =>
val(seated, remaining) => selectGroup(sect, groups)
getSeatingMap(remaining, tail, result ++ seated)
Here is how I was able to achieve without using the mutable.HashMap, the suggestion by the comment to use foldLeft was used to do it:
class SeatingCombination(val idList: List[Int], val seatCount: Int){}
class SeatingRequest(val id: Int, val partyName: String, val ticketCount: Int){}
class TheatreSection(val sectionSize: Int, rowNumber: Int, sectionNumber: Int) {
def id: String = rowNumber.toString + "_"+ sectionNumber.toString;
}
def getSeatingMap(groups: List[SeatingCombination], sections: List[TheatreSection]): Map[Int, TheatreSection] = {
sections.foldLeft(Map.empty[Int, TheatreSection]) { (m, sect) =>
val bestFitOpt = groups.find(g => {
g.seatCount <= sect.sectionSize && !isAnyListIdInMap(m, g.idList)
}).filter(_.idList.nonEmpty)
val newEntries = bestFitOpt.map(_.idList.map(_ -> sect)).getOrElse(List.empty)
m ++ newEntries
}
}
def isAnyListIdInMap(map: Map[Int, TheatreSection], list: List[Int]): Boolean = {
(for (id <- list) yield map.get(id).isDefined).reduce(_ || _)
}

Getting the parameters of a case class through Reflection

As a follow up of
Matt R's question, as Scala 2.10 has been out for quite an amount of time, what would be the best way to extract the fields and values of a case class. Taking a similar example:
case class Colour(red: Int, green: Int, blue: String) {
val other: Int = 42
}
val RBG = Colour(1,3,"isBlue")
I want to get a list (or array or any iterator for that matter) that would have the fields declared in the constructor as tuple values like these:
[(red, 1),(green, 3),(blue, "isBlue")]
I know the fact that there are a lot of examples on the net regarding the same issue but as I said, I wanted to know what should be the most ideal way to extract the required information
If you use Scala 2.10 reflection, this answer is half of the things you need. It will give you the method symbols of the case class, so you know the order and names of arguments:
import scala.reflect.runtime.{universe => ru}
import ru._
def getCaseMethods[T: TypeTag] = typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
case class Person(name: String, age: Int)
getCaseMethods[Person] // -> List(value age, value name)
You can call .name.toString on these methods to get the corresponding method names.
The next step is to invoke these methods on a given instance. You need a runtime mirror for that
val rm = runtimeMirror(getClass.getClassLoader)
Then you can "mirror" an actual instance:
val p = Person("foo", 33)
val pr = rm.reflect(p)
Then you can reflect on pr each method using reflectMethod and execute it via apply. Without going through each step separately, here is a solution altogether (see the val value = line for the mechanism of extracting a parameter's value):
def caseMap[T: TypeTag: reflect.ClassTag](instance: T): List[(String, Any)] = {
val im = rm.reflect(instance)
typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor =>
val name = m.name.toString
val value = im.reflectMethod(m).apply()
(name, value)
} (collection.breakOut)
}
caseMap(p) // -> List(age -> 33, name -> foo)
Every case object is a product, therefore you can use an iterator to get all its parameters' names and another iterator to get all its parameters' values:
case class Colour(red: Int, green: Int, blue: String) {
val other: Int = 42
}
val rgb = Colour(1, 3, "isBlue")
val names = rgb.productElementNames.toList // List(red, green, blue)
val values = rgb.productIterator.toList // List(1, 3, isBlue)
names.zip(values).foreach(print) // (red,1)(green,3)(blue,isBlue)
By product I mean both Cartesian product and an instance of Product. This requires Scala 2.13.0; although Product was available before, the iterator to get elements' names was only added in version 2.13.0.
Notice that no reflection is needed.

Allocation of Function Literals in Scala

I have a class that represents sales orders:
class SalesOrder(val f01:String, val f02:Int, ..., f50:Date)
The fXX fields are of various types. I am faced with the problem of creating an audit trail of my orders. Given two instances of the class, I have to determine which fields have changed. I have come up with the following:
class SalesOrder(val f01:String, val f02:Int, ..., val f50:Date){
def auditDifferences(that:SalesOrder): List[String] = {
def diff[A](fieldName:String, getField: SalesOrder => A) =
if(getField(this) != getField(that)) Some(fieldName) else None
val diffList = diff("f01", _.f01) :: diff("f02", _.f02) :: ...
:: diff("f50", _.f50) :: Nil
diffList.flatten
}
}
I was wondering what the compiler does with all the _.fXX functions: are they instanced just once (statically), and can be shared by all instances of my class, or will they be instanced every time I create an instance of my class?
My worry is that, since I will use a lot of SalesOrder instances, it may create a lot of garbage. Should I use a different approach?
One clean way of solving this problem would be to use the standard library's Ordering type class. For example:
class SalesOrder(val f01: String, val f02: Int, val f03: Char) {
def diff(that: SalesOrder) = SalesOrder.fieldOrderings.collect {
case (name, ord) if !ord.equiv(this, that) => name
}
}
object SalesOrder {
val fieldOrderings: List[(String, Ordering[SalesOrder])] = List(
"f01" -> Ordering.by(_.f01),
"f02" -> Ordering.by(_.f02),
"f03" -> Ordering.by(_.f03)
)
}
And then:
scala> val orderA = new SalesOrder("a", 1, 'a')
orderA: SalesOrder = SalesOrder#5827384f
scala> val orderB = new SalesOrder("b", 1, 'b')
orderB: SalesOrder = SalesOrder#3bf2e1c7
scala> orderA diff orderB
res0: List[String] = List(f01, f03)
You almost certainly don't need to worry about the perfomance of your original formulation, but this version is (arguably) nicer for unrelated reasons.
Yes, that creates 50 short lived functions. I don't think you should be worried unless you have manifest evidence that that causes a performance problem in your case.
But I would define a method that transforms SalesOrder into a Map[String, Any], then you would just have
trait SalesOrder {
def fields: Map[String, Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] = {
val af = a.fields
val bf = b.fields
af.collect { case (key, value) if bf(key) != value => key }
}
If the field names are indeed just incremental numbers, you could simplify
trait SalesOrder {
def fields: Iterable[Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] =
(a.fields zip b.fields).zipWithIndex.collect {
case ((av, bv), idx) if av != bv => f"f${idx + 1}%02d"
}