Is there an efficient way to avoid repeated evaluation with mapValues? - scala

The mapValues method creates a new Map that modifies the results of queries to the original Map by applying the given function. If the same value is queried twice, the function passed to mapValues is called twice.
For example:
case class A(i: Int) {
print("A")
}
case class B(a: A) {
print("B")
}
case class C(b: B) {
print("C")
}
val map = Map("One" -> 1)
.mapValues(A)
.mapValues(B)
.mapValues(C)
val a = map.get("One")
val b = map.get("One")
This will print ABCABC because a new set of case classes is created each time the value is queried.
How can I efficiently make this into a concrete Map that has pre-computed the mapValues functions? Ideally I would like a mechanism that does nothing if the Map already has concrete values.
I know that I can call map.map(identity) but this would re-compute the index for the Map which seems inefficient. The same is true if the last mapValues is converted to a map.
The view method will turn a strict Map into a non-strict Map, but there does not seem to be a method to do the opposite.

You can call force on the view to force evaluation:
scala> val strictMap = map.view.force
ABCstrictMap: scala.collection.immutable.Map[String,C] = Map(One -> C(B(A(1))))
scala> strictMap.get("One")
res1: Option[C] = Some(C(B(A(1))))
scala> strictMap.get("One")
res2: Option[C] = Some(C(B(A(1))))
I'd be careful about assuming that this will perform better than a simple map, though, and even if it does, the difference is likely to be negligible compared to the noise and the inconvenience if you need to cross-build for 2.11 or 2.12 and future Scala versions that will fix mapValues and change the view system entirely.

Related

Why do each new instance of case classes evaluate lazy vals again in Scala?

From what I have understood, scala treats val definitions as values.
So, any instance of a case class with same parameters should be equal.
But,
case class A(a: Int) {
lazy val k = {
println("k")
1
}
val a1 = A(5)
println(a1.k)
Output:
k
res1: Int = 1
println(a1.k)
Output:
res2: Int = 1
val a2 = A(5)
println(a1.k)
Output:
k
res3: Int = 1
I was expecting that for println(a2.k), it should not print k.
Since this is not the required behavior, how should I implement this so that for all instances of a case class with same parameters, it should only execute a lazy val definition only once. Do I need some memoization technique or Scala can handle this on its own?
I am very new to Scala and functional programming so please excuse me if you find the question trivial.
Assuming you're not overriding equals or doing something ill-advised like making the constructor args vars, it is the case that two case class instantiations with same constructor arguments will be equal. However, this does not mean that two case class instantiations with same constructor arguments will point to the same object in memory:
case class A(a: Int)
A(5) == A(5) // true, same as `A(5).equals(A(5))`
A(5) eq A(5) // false
If you want the constructor to always return the same object in memory, then you'll need to handle this yourself. Maybe use some sort of factory:
case class A private (a: Int) {
lazy val k = {
println("k")
1
}
}
object A {
private[this] val cache = collection.mutable.Map[Int, A]()
def build(a: Int) = {
cache.getOrElseUpdate(a, A(a))
}
}
val x = A.build(5)
x.k // prints k
val y = A.build(5)
y.k // doesn't print anything
x == y // true
x eq y // true
If, instead, you don't care about the constructor returning the same object, but you just care about the re-evaluation of k, you can just cache that part:
case class A(a: Int) {
lazy val k = A.kCache.getOrElseUpdate(a, {
println("k")
1
})
}
object A {
private[A] val kCache = collection.mutable.Map[Int, Int]()
}
A(5).k // prints k
A(5).k // doesn't print anything
The trivial answer is "this is what the language does according to the spec". That's the correct, but not very satisfying answer. It's more interesting why it does this.
It might be clearer that it has to do this with a different example:
case class A[B](b: B) {
lazy val k = {
println(b)
1
}
}
When you're constructing two A's, you can't know whether they are equal, because you haven't defined what it means for them to be equal (or what it means for B's to be equal). And you can't statically intitialize k either, as it depends on the passed in B.
If this has to print twice, it would be entirely intuitive if that would only be the case if k depends on b, but not if it doesn't depend on b.
When you ask
how should I implement this so that for all instances of a case class with same parameters, it should only execute a lazy val definition only once
that's a trickier question than it sounds. You make "the same parameters" sound like something that can be known at compile time without further information. It's not, you can only know it at runtime.
And if you only know that at runtime, that means you have to keep all past uses of the instance A[B] alive. This is a built in memory leak - no wonder Scala has no built-in way to do this.
If you really want this - and think long and hard about the memory leak - construct a Map[B, A[B]], and try to get a cached instance from that map, and if it doesn't exist, construct one and put it in the map.
I believe case classes only consider the arguments to their constructor (not any auxiliary constructor) to be part of their equality concept. Consider when you use a case class in a match statement, unapply only gives you access (by default) to the constructor parameters.
Consider anything in the body of case classes as "extra" or "side effect" stuffs. I consider it a good tactic to make case classes as near-empty as possible and put any custom logic in a companion object. Eg:
case class Foo(a:Int)
object Foo {
def apply(s: String) = Foo(s.toInt)
}
In addition to dhg answer, I should say, I'm not aware of functional language that does full constructor memoizing by default. You should understand that such memoizing means that all constructed instances should stick in memory, which is not always desirable.
Manual caching is not that hard, consider this simple code
import scala.collection.mutable
class Doubler private(a: Int) {
lazy val double = {
println("calculated")
a * 2
}
}
object Doubler{
val cache = mutable.WeakHashMap.empty[Int, Doubler]
def apply(a: Int): Doubler = cache.getOrElseUpdate(a, new Doubler(a))
}
Doubler(1).double //calculated
Doubler(5).double //calculated
Doubler(1).double //most probably not calculated

How to case match a View in scala

I'm implementing a class to constrain the access on an iterable. Intermediate steps of the sequence (after some map, etc...) is expected to be too big for memory. Thus map (and the likes: scanLeft, reduce, ...) should be lazy.
Internally I use map(...) = iterable.view.map( ... ). But it seems, IterableView.view is not it-self, which produce useless redirection when calling map multiple times. It is probably not critical, but I'd like to call .view only if the iterable is not already a view.
So, how can I case-match a View?
class LazyIterable[A](iterable: Iterable[A]){
def map[B](f: A => B) = {
val mapped = iterable match {
case v: View[A] => v // what should be here?
case i: Iterable[A] => i.view
}.map( f ))
new LazyIterable(mapped)
}
def compute() = iterable.toList
}
Note that I don't know what is the inputed Iterable, a concrete Seq (e.g. List, Vector) or a View. And if a View, I don't know on which concrete seq type (e.g. InterableView, SeqView, ...). And I got lost in the class hierarchy of View's & ViewLike's.
v: IterableView[A,_] is probably what you are looking for ...
But I don't think you need any of this to begin with.
I simply don't see what having this wrapper buys you at all. What benefits does writing
new LazyIterable(myThing).map(myFunc).compute
have over
myThing.view.map(myFunc).toList

How flatMap in a Map works in scala?

This is my code
def testMap() = {
val x = Map(
1 -> Map(
2 -> 3,
3 -> 4
),
5 -> Map(
6 -> 7,
7 -> 8
)
)
for {
(a, v) <- x
(b, c) <- v
} yield {
a
}
}
The code above gives
List(1, 1, 5, 5)
If I change the yield value of the for comprehension a to (a, b), the result is
Map(1 -> 3, 5 -> 7)
If I change (a, b) to (a, b, c), the result is
List((1,2,3), (1,3,4), (5,6,7), (5,7,8))
My question is what is the mechanism behind the determination of the result type in this for comprehension?
When you look into the API Documentation into the details of the map-Method you will find, that it has a second, implicit parameter of type CanBuildFrom.
An instance of CanBuildFrom from defines how a certain collection is build when mapping over some other collection and a certain element type is provided.
In the case where you get a Map as result, you are mapping over a Map and are providing binary tuples. So the compiler searches for a CanBuildFrom-instance, that can handle that.
To find such an instance, the compiler looks in different places, e.g. the current scope, the class a method is invoked on and its companion object.
In this case it will find an implicit field called canBuildFrom in the companion object of Map that is suitable and can be used to build a Map as result. So it tries to infer the result type to Map and as this succeeds uses this instance.
In the case, where you provide single values or triples instead, the instance found in the companion of Map does not have the required type, so it continues searching up the inheritance tree. It finds it in the companion object of Iterable. The instance their allows to build an Iterable of an arbitrary element type. So the compiler uses that.
So why do you get a List? Because that happens to be the implementation used there, the type system only guarantees you an Iterable.
If you want to get an Iterable instead of a Map you can provide a CanBuildFrom instance explicitly (only if you call map and flatMap directly) or just force the return type. There you will also notice that you won't be able to request a List even though you get one.
This wont work:
val l: List[Int] = Map(1->2).map(x=>3)
This however will:
val l: Iterable[Int] = Map(1->2).map(x=>3)
To add to #dth, if you want a list, you can do:
val l = Map(1->2,3->4).view.map( ... ).toList
Here the map function apply on a lazy IterableView, which output also an IterableView, and the actual construction is triggered by the toList.
Note: Also, not using view can result in a dangerous behavior. Example:
val m = Map(2->2,3->3)
val l = m.map{ case (k,v) => (k/2,v) ).toList
// List((1,3))
val l = m.view.map{ case (k,v) => (k/2,v) ).toList
// List((1,2), (1,3))
Here, omitting the .view make the map output a Map which overrides duplicate keys (and does additional and unnecessary work).

How to combine Maps with different value types in Scala

I have the following code which is working:
case class Step() {
def bindings(): Map[String, Any] = ???
}
class Builder {
private val globalBindings = scala.collection.mutable.HashMap.empty[String, Any]
private val steps = scala.collection.mutable.ArrayBuffer.empty[Step]
private def context: Map[String, Any] =
globalBindings.foldLeft(Map[String, Any]())((l, r) => l + r) ++ Map[String, Any]("steps" -> steps.foldLeft(Vector[Map[String, Any]]())((l, r) => l.+:(r.bindings)))
}
But I think it could be simplified so as to not need the first foldLeft in the 'context' method.
The desired result is to produce a map where the entry values are either a String, an object upon which toString will be invoked later, or a function which returns a String.
Is this the best I can do with Scala's type system or can I make the code clearer?
TIA
First of all, the toMap method on mutable.HashMap returns an immutable.Map. You can also use map instead of the inner foldLeft together with toVector if you really need a vector, which might be unnecessary. Finally, you can just use + to add the desired key-value pair of "steps" to the map.
So your whole method body could be:
globalBindings.toMap + ("steps" -> steps.map(_.bindings).toVector)
I'd also note that you should be apprehensive of using types like Map[String, Any] in Scala. So much of the power of Scala comes from its type system and it can be used to great effect in many such situations, and so these types are often considered unidiomatic. Of course, there are situations where this approach makes the most sense, and without more context it would be hard to determine if that were true here.

Scala: What is the most efficient way convert a Map[K,V] to an IntMap[V]?

Let"s say I have a class Point with a toInt method, and I have an immutable Map[Point,V], for some type V. What is the most efficient way in Scala to convert it to an IntMap[V]? Here is my current implementation:
def pointMap2IntMap[T](points: Map[Point,T]): IntMap[T] = {
var result: IntMap[T] = IntMap.empty[T]
for(t <- points) {
result += (t._1.toInt, t._2)
}
result
}
[EDIT] I meant primarily faster, but I would also be interested in shorter versions, even if they are not obviously faster.
IntMap has a built-in factory method (apply) for this:
IntMap(points.map(p => (p._1.toInt, p._2)).toSeq: _*)
If speed is an issue, you may use:
points.foldLeft(IntMap.empty[T])((m, p) => m.updated(p._1.toInt, p._2))
A one liner that uses breakOut to obtain an IntMap. It does a map to a new collection, using a custom builder factory CanBuildFrom which the breakOut call resolves:
Map[Int, String](1 -> "").map(kv => kv)(breakOut[Map[Int, String], (Int, String), immutable.IntMap[String]])
In terms of performance, it's hard to tell, but it creates a new IntMap, goes through all the bindings and adds them to the IntMap. A handwritten iterator while loop (preceded with a pattern match to check if the source map is an IntMap) would possibly result in somewhat better performance.