Scala Enum.values.map(_.id).contains(value) cost much time - scala

I want to check if a specify id that contained in an Enumeration.
So I write down the contains function
object Enum extends Enumeration {
type Enum = Value
val A = Value(2, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
}
But the time cost is unexpected while id is a big number, such as over eight-digit
val A = Value(222222222, "A")
Then the contains function cost over 1000ms per calling.
And I also noticed the first time calling always cost hundreds millisecond whether the id is big or small.
I can't figure out why.

First, lets talk about the cost of Enum.values. This is implemented here:
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L83
The implementation is essentially setting up a mutable map. Once it is set up, it is re-used.
The cost for big numbers in your Value is because, internally Scala library uses a BitSet.
See here: https://github.com/scala/scala/blob/0b47dc2f28c997aed86d6f615da00f48913dd46c/src/library/scala/Enumeration.scala#L245
So, for larger numbers, BitSet will be bigger. That only happens when you call Enum.values.
Depending on your specific uses case you can choose between using Enumeration or Case Object:
Case objects vs Enumerations in Scala

It sure looks like the mechanics of Enumeration don't handle large ints well in that position. The Scaladocs for the class don't say anything about this, but they don't advertise using Enumeration.Value the way you do either. They say, e.g., val A = Value, where you say val A = Value(2000, "A").
If you want to keep your contains method as you have it, why don't you cache the Enum.values.map(_.id)? Much faster.
object mult extends App {
object Enum extends Enumeration {
type Enum = Value
val A1 = Value(1, "A")
val A2 = Value(2, "A")
val A222 = Enum.Value(222222222, "A")
def contains(value: Int): Boolean = {
Enum.values.map(_.id).contains(value)
}
val cache = Enum.values.map(_.id)
def contains2(value: Int): Boolean = {
cache.contains(value)
}
}
def clockit(desc: String, f: => Unit) = {
val start = System.currentTimeMillis
f
val end = System.currentTimeMillis
println(s"$desc ${end - start}")
}
clockit("initialize Enum ", Enum.A1)
clockit("contains 2 ", Enum.contains(2))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains 222222222 ", Enum.contains(222222222))
clockit("contains2 2 ", Enum.contains2(2))
clockit("contains2 222222222", Enum.contains2(222222222))
}

Related

Scala : How to pass a class field into a method

I'm new to Scala and attempting to do some data analysis.
I have a CSV files with a few headers - lets say item no., item type, month, items sold.
I have made an Item class with the fields of the headers.
I split the CSV into a list with each iteration of the list being a row of the CSV file being represented by the Item class.
I am attempting to make a method that will create maps based off of the parameter I send in. For example if I want to group the items sold by month, or by item type. However I am struggling to send the Item.field into a method.
F.e what I am attempting is something like:
makemaps(Item.month);
makemaps(Item.itemtype);
def makemaps(Item.field):
if (item.field==Item.month){}
else (if item.field==Item.itemType){}
However my logic for this appears to be wrong. Any ideas?
def makeMap[T](items: Iterable[Item])(extractKey: Item => T): Map[T, Iterable[Item]] =
items.groupBy(extractKey)
So given this example Item class:
case class Item(month: String, itemType: String, quantity: Int, description: String)
You could have (I believe the type ascriptions are mandatory):
val byMonth = makeMap[String](items)(_.month)
val byType = makeMap[String](items)(_.itemType)
val byQuantity = makeMap[Int](items)(_.quantity)
val byDescription = makeMap[String](items)(_.description)
Note that _.month, for instance, creates a function taking an Item which results in the String contained in the month field (simplifying a little).
You could, if so inclined, save the functions used for extracting keys in the companion object:
object Item {
val month: Item => String = _.month
val itemType: Item => String = _.itemType
val quantity: Item => Int = _.quantity
val description: Item => String = _.description
// Allows us to determine if using a predefined extractor or using an ad hoc one
val extractors: Set[Item => Any] = Set(month, itemType, quantity, description)
}
Then you can pass those around like so:
val byMonth = makeMap[String](items)(Item.month)
The only real change semantically is that you explicitly avoid possible extra construction of lambdas at runtime, at the cost of having the lambdas stick around in memory the whole time. A fringe benefit is that you might be able to cache the maps by extractor if you're sure that the source Items never change: for lambdas, equality is reference equality. This might be particularly useful if you have some class representing the collection of Items as opposed to just using a standard collection, like so:
object Items {
def makeMap[T](items: Iterable[Item])(extractKey: Item => T): Map[T,
Iterable[Item]] =
items.groupBy(extractKey)
}
class Items(val underlying: immutable.Seq[Item]) {
def makeMap[T](extractKey: Item => T): Map[T, Iterable[Item]] =
if (Item.extractors.contains(extractKey)) {
if (extractKey == Item.month) groupedByMonth.asInstanceOf[Map[T, Iterable[Item]]]
else if (extractKey == Item.itemType) groupedByItemType.asInstanceOf[Map[T, Iterable[Item]]]
else if (extractKey == Item.quantity) groupedByQuantity.asInstanceOf[Map[T, Iterable[Item]]]
else if (extractKey == Item.description) groupedByDescription.asInstanceOf[Map[T, Iterable[Item]]]
else throw new AssertionError("Shouldn't happen!")
} else {
Items.makeMap(underlying)(extractKey)
}
lazy val groupedByMonth = Items.makeMap[String](underlying)(Item.month)
lazy val groupedByItemType = Items.makeMap[String](underlying)(Item.itemType)
lazy val groupedByQuantity = Items.makeMap[Int](underlying)(Item.quantity)
lazy val groupedByDescription = Items.makeMap[String](underlying)(Item.description)
}
(that is almost certainly a personal record for asInstanceOfs in a small block of code... I'm not sure if I should be proud or ashamed of this snippet)

Immutable paired instances of Case Classes in Scala?

I'm trying to model a relationship which can be reversed. For example, the reverse of North might be South. The reverse of Left might be Right. I'd like to use a case class to represent my relationships. I found a similar solution that uses case Objects here, but it's not quite what I want, here.
Here's my non-functional code:
case class Relationship(name: String, opposite:Relationship)
def relationshipFactory(nameA:String, nameB:String): Relationship = {
lazy val x:Relationship = Relationship(nameA, Relationship(nameB, x))
x
}
val ns = relationshipFactory("North", "South")
ns // North
ns.opposite // South
ns.opposite.opposite // North
ns.opposite.opposite.opposite // South
Can this code be changed so that:
It dosen't crash
I can create these things on demand as pairs.
If you really want to build graphs of immutable objects with circular dependencies, you have to declare opposite as def, and (preferably) throw one more lazy val into the mix:
abstract class Relationship(val name: String) {
def opposite: Relationship
}
object Relationship {
/** Factory method */
def apply(nameA: String, nameB: String): Relationship = {
lazy val x: Relationship = new Relationship(nameA) {
lazy val opposite = new Relationship(nameB) {
def opposite = x
}
}
x
}
/** Extractor */
def unapply(r: Relationship): Option[(String, Relationship)] =
Some((r.name, r.opposite))
}
val ns = Relationship("North", "South")
println(ns.name)
println(ns.opposite.name)
println(ns.opposite.opposite.name)
println(ns.opposite.opposite.opposite.name)
You can quickly convince yourself that nothing bad happens if you run a few million rounds on this circle of circular dependencies:
// just to demonstrate that it doesn't blow up in any way if you
// call it hundred million times:
// Should be "North"
println((1 to 100000000).foldLeft(ns)((r, _) => r.opposite).name)
It indeed prints "North". It doesn work with case classes, but you can always add your own extractors, so this works:
val Relationship(x, op) = ns
val Relationship(y, original) = op
println(s"Extracted x = $x y = $y")
It prints "North" and "South" for x and y.
However, the more obvious thing to do would be to just save both components of a relation, and add opposite as a method that constructs the opposite pair.
case class Rel(a: String, b: String) {
def opposite: Rel = Rel(b, a)
}
Actually, this is already implemented in the standard library:
scala> val rel = ("North", "South")
rel: (String, String) = (North,South)
scala> rel.swap
res0: (String, String) = (South,North)
you have cyclic dependencies, this won't work. One option is to do:
case class Relationship(name: String)
and have a setter to specify the opposite. The factory would then do:
def relationshipFactory(nameA:String, nameB:String): Relationship = {
val x:Relationship = Relationship(nameA)
val opposite = Relationship(nameB)
x.setOpposite(opposite)
opposite.setOpposite(x)
x
}
another option:
case class Relationship(name: String) {
lazy val opposite = Utils.computeOpposite(this)
}
and have the opposite logic on the Utils object
yet another option: probably you don't want several South instances, so you should use case objects or enums (more on that at http://pedrorijo.com/blog/scala-enums/)
Using enums you can use pattern matching to do that logic without no overhead

Is it possible to reference count call location?

This question isn't programming language specific (the more general the better), but I'm working in Scala (not necessarily on the JVM). Is there a means to reference count by call location, not the number of total calls? In particular, it would be great to be able to detect if a given method is called from more than one call location.
I think I can fake it to some extent by doing a reference equality check with a function, but this could be abused easily by having a global-ish token, or even calling the function multiple times in the same scope:
sealed case class Token();
class MyClass[A] {
var tokenOpt: Option[Token] = None
def callMeFromOnePlace(x: A)(implicit tk: Token) = {
tokenOpt match {
case Some(priorTk) => if (priorTk ne tk) throw new IllegalStateException("")
case None => tokenOpt = Some(tk)
}
// Do some work ...
}
}
Then this should work fine:
val myObj = new MyClass[Int]
val myIntList = List(1,2,3)
implicit val token = Token()
myIntList.map(ii => myObj.callMeFromOnePlace(ii))
But unfortunately, so would this:
val myObj = new MyClass[Int]
implicit val token = Token()
myObj.callMeFromOnePlace(1)
myObj.callMeFromOnePlace(1) //oops, want this to fail
When you are talking about call location, it can be represented by a call stack trace. Here is a simple example:
// keep track of calls here (you can use immutable style if you want)
var callCounts = Map.empty[Int, Int]
def f(): Unit = {
// calculate call stack trace hashCode for more efficient storage
// .toSeq makes WrappedArray, that knows how to properly calculate .hashCode()
val hashCode = new RuntimeException().getStackTrace.toSeq.hashCode()
val callLocation = hashCode
callCounts += (callLocation -> (callCounts.getOrElse(callLocation, 0) + 1))
}
List(1,2,3).foreach(_ =>
f()
)
f()
f()
println(callCounts) // Map(75070239 -> 3, 900408638 -> 1, -1658734417 -> 1)
I am not completely clear what you want to do but for your //oops.. example to fail you need just check the PriorTk is not None. (do note that it is not a thread safe solution )
For completeness, enforcing these kind of constraints from a type system perspective requires linear types.

How can I extend Scala collections with member values?

Say I have the following data structure:
case class Timestamped[CC[M] < Seq[M]](elems : CC, timestamp : String)
So it's essentially a sequence with an attribute -- a timestamp -- attached to it. This works fine and I could create new instances with the syntax
val t = Timestamped(Seq(1,2,3,4),"2014-02-25")
t.elems.head // 1
t.timestamp // "2014-05-25"
The syntax is unwieldly and instead I want to be able to do something like:
Timestamped(1,2,3,4)("2014-02-25")
t.head // 1
t.timestamp // "2014-05-25"
Where timestamped is just an extension of a Seq and it's implementation SeqLike, with a single attribute val timestamp : String.
This seems easy to do; just use a Seq with a mixin TimestampMixin { val timestamp : String }. But I can't figure out how to create the constructor. My question is: how do I create a constructor in the companion object, that creates a sequence with an extra member value? The signature is as follows:
object Timestamped {
def apply(elems: M*)(timestamp : String) : Seq[M] with TimestampMixin = ???
}
You'll see that it's not straightforward; collections use Builders to instantiate themselves, so I can't simply call the constructor an override some vals.
Scala collections are very complicated structures when it comes down to it. Extending Seq requires implementing apply, length, and iterator methods. In the end, you'll probably end up duplicating existing code for List, Set, or something else. You'll also probably have to worry about CanBuildFroms for your collection, which in the end I don't think is worth it if you just want to add a field.
Instead, consider an implicit conversion from your Timestamped type to Seq.
case class Timestamped[A](elems: Seq[A])(timestamp: String)
object Timestamped {
implicit def toSeq[A](ts: Timestamped[A]): Seq[A] = ts.elems
}
Now, whenever I try to call a method from Seq, the compiler will implicitly convert Timestamped to Seq, and we can proceed as normal.
scala> val ts = Timestamped(List(1,2,3,4))("1/2/34")
ts: Timestamped[Int] = Timestamped(List(1, 2, 3, 4))
scala> ts.filter(_ > 2)
res18: Seq[Int] = List(3, 4)
There is one major drawback here, and it's that we're now stuck with Seq after performing operations on the original Timestamped.
Go the other way... extend Seq, it only has 3 abstract members:
case class Stamped[T](elems: Seq[T], stamp: Long) extends Seq[T] {
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(List(10,20,30), 15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Keep in mind, this only works as long as Seq is specific enough for your use cases, if you later find you need more complex collection behavior this may become untenable.
EDIT: Added a version closer to the signature you were trying to create. Not sure if this helps you any more:
case class Stamped[T](elems: T*)(stamp: Long) extends Seq[T] {
def timeStamp = stamp
override def apply(i: Int) = elems.apply(i)
override def iterator = elems.iterator
override def length = elems.length
}
val x = Stamped(10,20,30)(15L)
println(x.head) // 10
println(x.timeStamp) // 15
println(x.map { _ * 10}) // List(100, 200, 300)
println(x.filter { _ > 20}) // List(30)
Where elems would end up being a generically created WrappedArray.

What is the ideal collection for incremental (with multiple passings) filtering of collection?

I've seen many questions about Scala collections and could not decide.
This question was the most useful until now.
I think the core of the question is twofold:
1) Which are the best collections for this use case?
2) Which are the recommended ways to use them?
Details:
I am implementing an algorithm that iterates over all elements in a collection
searching for the one that matches a certain criterion.
After the search, the next step is to search again with a new criterion, but without the chosen element among the possibilities.
The idea is to create a sequence with all original elements ordered by the criterion (which changes at every new selection).
The original sequence doesn't really need to be ordered, but there can be duplicates (the algorithm will only pick one at a time).
Example with a small sequence of Ints (just to simplify):
object Foo extends App {
def f(already_selected: Seq[Int])(element: Int): Double =
// something more complex happens here,
// specially something take takes 'already_selected' into account
math.sqrt(element)
//call to the algorithm
val (result, ti) = Tempo.time(recur(Seq.fill(9900)(Random.nextInt), Seq()))
println("ti = " + ti)
//algorithm
def recur(collection: Seq[Int], already_selected: Seq[Int]): (Seq[Int], Seq[Int]) =
if (collection.isEmpty) (Seq(), already_selected)
else {
val selected = collection maxBy f(already_selected)
val rest = collection diff Seq(selected) //this part doesn't seem to be efficient
recur(rest, selected +: already_selected)
}
}
object Tempo {
def time[T](f: => T): (T, Double) = {
val s = System.currentTimeMillis
(f, (System.currentTimeMillis - s) / 1000d)
}
}
Try #inline and as icn suggested How can I idiomatically "remove" a single element from a list in Scala and close the gap?:
object Foo extends App {
#inline
def f(already_selected: Seq[Int])(element: Int): Double =
// something more complex happens here,
// specially something take takes 'already_selected' into account
math.sqrt(element)
//call to the algorithm
val (result, ti) = Tempo.time(recur(Seq.fill(9900)(Random.nextInt()).zipWithIndex, Seq()))
println("ti = " + ti)
//algorithm
#tailrec
def recur(collection: Seq[(Int, Int)], already_selected: Seq[Int]): Seq[Int] =
if (collection.isEmpty) already_selected
else {
val (selected, i) = collection.maxBy(x => f(already_selected)(x._2))
val rest = collection.patch(i, Nil, 1) //this part doesn't seem to be efficient
recur(rest, selected +: already_selected)
}
}
object Tempo {
def time[T](f: => T): (T, Double) = {
val s = System.currentTimeMillis
(f, (System.currentTimeMillis - s) / 1000d)
}
}