How can I extend Scala collections with member values? - scala

Say I have the following data structure:
case class Timestamped[CC[M] < Seq[M]](elems : CC, timestamp : String)
So it's essentially a sequence with an attribute -- a timestamp -- attached to it. This works fine and I could create new instances with the syntax
val t = Timestamped(Seq(1,2,3,4),"2014-02-25")
t.elems.head // 1
t.timestamp // "2014-05-25"
The syntax is unwieldly and instead I want to be able to do something like:
Timestamped(1,2,3,4)("2014-02-25")
t.head // 1
t.timestamp // "2014-05-25"
Where timestamped is just an extension of a Seq and it's implementation SeqLike, with a single attribute val timestamp : String.
This seems easy to do; just use a Seq with a mixin TimestampMixin { val timestamp : String }. But I can't figure out how to create the constructor. My question is: how do I create a constructor in the companion object, that creates a sequence with an extra member value? The signature is as follows:
object Timestamped {
def apply(elems: M*)(timestamp : String) : Seq[M] with TimestampMixin = ???
}
You'll see that it's not straightforward; collections use Builders to instantiate themselves, so I can't simply call the constructor an override some vals.

Scala collections are very complicated structures when it comes down to it. Extending Seq requires implementing apply, length, and iterator methods. In the end, you'll probably end up duplicating existing code for List, Set, or something else. You'll also probably have to worry about CanBuildFroms for your collection, which in the end I don't think is worth it if you just want to add a field.
Instead, consider an implicit conversion from your Timestamped type to Seq.
case class Timestamped[A](elems: Seq[A])(timestamp: String)
object Timestamped {
implicit def toSeq[A](ts: Timestamped[A]): Seq[A] = ts.elems
}
Now, whenever I try to call a method from Seq, the compiler will implicitly convert Timestamped to Seq, and we can proceed as normal.
scala> val ts = Timestamped(List(1,2,3,4))("1/2/34")
ts: Timestamped[Int] = Timestamped(List(1, 2, 3, 4))
scala> ts.filter(_ > 2)
res18: Seq[Int] = List(3, 4)
There is one major drawback here, and it's that we're now stuck with Seq after performing operations on the original Timestamped.

Go the other way... extend Seq, it only has 3 abstract members:
case class Stamped[T](elems: Seq[T], stamp: Long) extends Seq[T] {
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(List(10,20,30), 15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Keep in mind, this only works as long as Seq is specific enough for your use cases, if you later find you need more complex collection behavior this may become untenable.
EDIT: Added a version closer to the signature you were trying to create. Not sure if this helps you any more:
case class Stamped[T](elems: T*)(stamp: Long) extends Seq[T] {
def timeStamp = stamp
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(10,20,30)(15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Where elems would end up being a generically created WrappedArray.

Related

Scala Enum.values.map(_.id).contains(value) cost much time

I want to check if a specify id that contained in an Enumeration.
So I write down the contains function
object Enum extends Enumeration {
type Enum = Value
val A = Value(2, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
}
But the time cost is unexpected while id is a big number, such as over eight-digit
val A = Value(222222222, "A")
Then the contains function cost over 1000ms per calling.
And I also noticed the first time calling always cost hundreds millisecond whether the id is big or small.
I can't figure out why.
First, lets talk about the cost of Enum.values. This is implemented here:
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L83
The implementation is essentially setting up a mutable map. Once it is set up, it is re-used.
The cost for big numbers in your Value is because, internally Scala library uses a BitSet.
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L245
So, for larger numbers, BitSet will be bigger. That only happens when you call Enum.values.
Depending on your specific uses case you can choose between using Enumeration or Case Object:
Case objects vs Enumerations in Scala
It sure looks like the mechanics of Enumeration don't handle large ints well in that position. The Scaladocs for the class don't say anything about this, but they don't advertise using Enumeration.Value the way you do either. They say, e.g., val A = Value, where you say val A = Value(2000, "A").
If you want to keep your contains method as you have it, why don't you cache the Enum.values.map(_.id)? Much faster.
object mult extends App {
object Enum extends Enumeration {
type Enum = Value
val A1 = Value(1, "A")
val A2 = Value(2, "A")
val A222 = Enum.Value(222222222, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
val cache = Enum.values.map(_.id)
def contains2(value: Int): Boolean = {
cache.contains(value)
}
}
def clockit(desc: String, f: => Unit) = {
val start = System.currentTimeMillis
f
val end = System.currentTimeMillis
println(s"$desc ${end - start}")
}
clockit("initialize Enum ", Enum.A1)
clockit("contains 2 ", Enum.contains(2))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains2 2 ", Enum.contains2(2))
clockit("contains2 222222222", Enum.contains2(222222222))
}

Creating a modified `filter` function

Consider the filter function.
I am interested in the following modifications of the filter function, if possible:
We know for a collection we can do:
case class People(val age: Int)
val a: List[People] = ...
a.filter(i => i.age ==10 )
Or more simply:
a.filter(_.age==10 )
Any simple way I can define another modified filter that works just like the following (no underline)
a.myfilter1( age==10 )
the filter function does not work when its argument is no Boolean. Suppose I want to create a modified filter that when a non-Boolean is given, it translates to equality automatically. Here is an example:
val anotherPerson: People = ...
a.myFilter2(anotherPerson)
I want the above myFilter2 to get translated as following:
a.filter(_.equals(anotherPerson))
Using implicit def:
case class MyFilterable[T](seq: Seq[T]) {
def suchAFilter(v: Any): Seq[T] = {
seq.filter(v.equals)
}
}
implicit def strongFilter[T](seq: Seq[T]): MyFilterable[T] = {
MyFilterable(seq)
}
println(List(1,2,3).suchAFilter(2))

Getting the parameters of a case class through Reflection

As a follow up of
Matt R's question, as Scala 2.10 has been out for quite an amount of time, what would be the best way to extract the fields and values of a case class. Taking a similar example:
case class Colour(red: Int, green: Int, blue: String) {
val other: Int = 42
}
val RBG = Colour(1,3,"isBlue")
I want to get a list (or array or any iterator for that matter) that would have the fields declared in the constructor as tuple values like these:
[(red, 1),(green, 3),(blue, "isBlue")]
I know the fact that there are a lot of examples on the net regarding the same issue but as I said, I wanted to know what should be the most ideal way to extract the required information
If you use Scala 2.10 reflection, this answer is half of the things you need. It will give you the method symbols of the case class, so you know the order and names of arguments:
import scala.reflect.runtime.{universe => ru}
import ru._
def getCaseMethods[T: TypeTag] = typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor => m
}.toList
case class Person(name: String, age: Int)
getCaseMethods[Person] // -> List(value age, value name)
You can call .name.toString on these methods to get the corresponding method names.
The next step is to invoke these methods on a given instance. You need a runtime mirror for that
val rm = runtimeMirror(getClass.getClassLoader)
Then you can "mirror" an actual instance:
val p = Person("foo", 33)
val pr = rm.reflect(p)
Then you can reflect on pr each method using reflectMethod and execute it via apply. Without going through each step separately, here is a solution altogether (see the val value = line for the mechanism of extracting a parameter's value):
def caseMap[T: TypeTag: reflect.ClassTag](instance: T): List[(String, Any)] = {
val im = rm.reflect(instance)
typeOf[T].members.collect {
case m: MethodSymbol if m.isCaseAccessor =>
val name = m.name.toString
val value = im.reflectMethod(m).apply()
(name, value)
} (collection.breakOut)
}
caseMap(p) // -> List(age -> 33, name -> foo)
Every case object is a product, therefore you can use an iterator to get all its parameters' names and another iterator to get all its parameters' values:
case class Colour(red: Int, green: Int, blue: String) {
val other: Int = 42
}
val rgb = Colour(1, 3, "isBlue")
val names = rgb.productElementNames.toList // List(red, green, blue)
val values = rgb.productIterator.toList // List(1, 3, isBlue)
names.zip(values).foreach(print) // (red,1)(green,3)(blue,isBlue)
By product I mean both Cartesian product and an instance of Product. This requires Scala 2.13.0; although Product was available before, the iterator to get elements' names was only added in version 2.13.0.
Notice that no reflection is needed.

Allocation of Function Literals in Scala

I have a class that represents sales orders:
class SalesOrder(val f01:String, val f02:Int, ..., f50:Date)
The fXX fields are of various types. I am faced with the problem of creating an audit trail of my orders. Given two instances of the class, I have to determine which fields have changed. I have come up with the following:
class SalesOrder(val f01:String, val f02:Int, ..., val f50:Date){
def auditDifferences(that:SalesOrder): List[String] = {
def diff[A](fieldName:String, getField: SalesOrder => A) =
if(getField(this) != getField(that)) Some(fieldName) else None
val diffList = diff("f01", _.f01) :: diff("f02", _.f02) :: ...
:: diff("f50", _.f50) :: Nil
diffList.flatten
}
}
I was wondering what the compiler does with all the _.fXX functions: are they instanced just once (statically), and can be shared by all instances of my class, or will they be instanced every time I create an instance of my class?
My worry is that, since I will use a lot of SalesOrder instances, it may create a lot of garbage. Should I use a different approach?
One clean way of solving this problem would be to use the standard library's Ordering type class. For example:
class SalesOrder(val f01: String, val f02: Int, val f03: Char) {
def diff(that: SalesOrder) = SalesOrder.fieldOrderings.collect {
case (name, ord) if !ord.equiv(this, that) => name
}
}
object SalesOrder {
val fieldOrderings: List[(String, Ordering[SalesOrder])] = List(
"f01" -> Ordering.by(_.f01),
"f02" -> Ordering.by(_.f02),
"f03" -> Ordering.by(_.f03)
)
}
And then:
scala> val orderA = new SalesOrder("a", 1, 'a')
orderA: SalesOrder = SalesOrder#5827384f
scala> val orderB = new SalesOrder("b", 1, 'b')
orderB: SalesOrder = SalesOrder#3bf2e1c7
scala> orderA diff orderB
res0: List[String] = List(f01, f03)
You almost certainly don't need to worry about the perfomance of your original formulation, but this version is (arguably) nicer for unrelated reasons.
Yes, that creates 50 short lived functions. I don't think you should be worried unless you have manifest evidence that that causes a performance problem in your case.
But I would define a method that transforms SalesOrder into a Map[String, Any], then you would just have
trait SalesOrder {
def fields: Map[String, Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] = {
val af = a.fields
val bf = b.fields
af.collect { case (key, value) if bf(key) != value => key }
}
If the field names are indeed just incremental numbers, you could simplify
trait SalesOrder {
def fields: Iterable[Any]
}
def diff(a: SalesOrder, b: SalesOrder): Iterable[String] =
(a.fields zip b.fields).zipWithIndex.collect {
case ((av, bv), idx) if av != bv => f"f${idx + 1}%02d"
}

How to use scala.util.Sorting.quickSort() with arbitrary types?

I need to sort an array of pairs by second element. How do I pass comparator for my pairs to the quickSort function?
I'm using the following ugly approach now:
type AccResult = (AccUnit, Long) // pair
class Comparator(a:AccResult) extends Ordered[AccResult] {
def compare(that:AccResult) = lessCompare(a, that)
def lessCompare(a:AccResult, that:AccResult) = if (a._2 == that._2) 0 else if (a._2 < that._2) -1 else 1
}
scala.util.Sorting.quickSort(data)(d => new Comparator(d))
Why is quickSort designed to have an ordered view instead of usual comparator argument?
Scala 2.7 solutions are preferred.
I tend to prefer the non-implicit arguments unless its being used in more than a few places.
type Pair = (String,Int)
val items : Array[Pair] = Array(("one",1),("three",3),("two",2))
quickSort(items)(new Ordering[Pair] {
def compare(x: Pair, y: Pair) = {
x._2 compare y._2
}
})
Edit: After learning about view bounds in another question, I think that this approach might be better:
val items : Array[(String,Int)] = Array(("one",1),("three",3),("two",2))
class OrderTupleBySecond[X,Y <% Comparable[Y]] extends Ordering[(X,Y)] {
def compare(x: (X,Y), y: (X,Y)) = {
x._2 compareTo y._2
}
}
util.Sorting.quickSort(items)(new OrderTupleBySecond[String,Int])
In this way, OrderTupleBySecond could be used for any Tuple2 type where the type of the 2nd member of the tuple has a view in scope which would convert it to a Comparable.
Ok, I'm not sure exactly what you are unhappy about what you are currently doing, but perhaps all you are looking for is this?
implicit def toComparator(a: AccResult) = new Comparator(a)
scala.util.Sorting.quickSort(data)
If, on the other hand, the problem is that the tuple is Ordered and you want a different ordering, well, that's why it changed on Scala 2.8.
* EDIT *
Ouch! Sorry, I only now realize you said you preferred Scala 2.7 solutions. I have editted this answer soon to put the solution for 2.7 above. What follows is a 2.8 solution.
Scala 2.8 expects an Ordering, not an Ordered, which is a context bound, not a view bound. You'd write your code in 2.8 like this:
type AccResult = (AccUnit, Long) // pair
implicit object AccResultOrdering extends Ordering[AccResult] {
def compare(x: AccResult, y: AccResult) = if (x._2 == y._2) 0 else if (x._2 < y._2) -1 else 1
}
Or maybe just:
type AccResult = (AccUnit, Long) // pair
implicit val AccResultOrdering = Ordering by ((_: AccResult)._2)
And use it like:
scala.util.Sorting.quickSort(data)
On the other hand, the usual way to do sort in Scala 2.8 is just to call one of the sorting methods on it, such as:
data.sortBy((_: AccResult)._2)
Have your type extend Ordered, like so:
case class Thing(number : Integer, name: String) extends Ordered[Thing] {
def compare(that: Thing) = name.compare(that.name)
}
And then pass it to sort, like so:
val array = Array(Thing(4, "Doll"), Thing(2, "Monkey"), Thing(7, "Green"))
scala.util.Sorting.quickSort(array)
Printing the array will give you:
array.foreach{ e => print(e) }
>> Thing(4,Doll) Thing(7,Green) Thing(2,Monkey)