When does Scala actually copy objects? - scala

Background
I have a chunk of code that looks like this:
val big_obj = new BigObj
big_obj.recs[5].foo()
... // other code
big_obj.recs[7].bar()
Problem
I want to do something like this
val big_obj = new BigObj
alias ref = big_obj.recs // looking for something like an alias
ref[5].foo()
... // other code
ref[7].bar()
because I am afraid of making copies of big objects (coming from C++). But then I realised that Scala is probably smart and if I simply do this:
val big_obj = new BigObj
val ref = big_obj.recs // no copies made?
the compiler is probably smart enough to not copy anyways, since it's all read-only.
Question
This got me wondering about Scala's memory model.
Under what situations will copies be made/not made?
I am looking for a simple answer or rule-of-thumb which I can keep in my mind when I deal with really_big_objects, whenever I make assignments, including passing arguments.

Just like Java (and python and probably a lot of other languages), copies are never made of objects. When you assign an object or pass it as an argument, it only copies the reference to the object; the actual object just sits in memory and has an extra thing pointing to it. The only things that would get copied are primitives (integers, doubles, etc).
As you pointed out, this is obviously good for immutable objects, but it's true of all objects, even mutable ones:
scala> val a = collection.mutable.Map(1 -> 2)
a: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> val b = a
b: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> b += (2 -> 4)
res41: b.type = Map(2 -> 4, 1 -> 2)
scala> a
res42: scala.collection.mutable.Map[Int,Int] = Map(2 -> 4, 1 -> 2)
scala> def addTo(m: collection.mutable.Map[Int,Int]) { m += (3 -> 9) }
addTo: (m: scala.collection.mutable.Map[Int,Int])Unit
scala> addTo(b)
scala> a
res44: scala.collection.mutable.Map[Int,Int] = Map(2 -> 4, 1 -> 2, 3 -> 9)

Related

Checking sameness/equality in Scala

As I asked in other post (Unique id for Scala object), it doesn't seem like that I can have id just like Python.
I still need to check the sameness in Scala for unittest. I run a test and compare the returned value of some nested collection object (i.e., List[Map[Int, ...]]) with the one that I create.
However, the hashCode for mutable map is the same as that of immutable map. As a result (x == y) returns True.
scala> val x = Map("a" -> 10, "b" -> 20)
x: scala.collection.immutable.Map[String,Int] = Map(a -> 10, b -> 20)
scala> x.hashCode
res0: Int = -1001662700
scala> val y = collection.mutable.Map("b" -> 20, "a" -> 10)
y: scala.collection.mutable.Map[String,Int] = Map(b -> 20, a -> 10)
scala> y.hashCode
res2: Int = -1001662700
In some cases, it's OK, but in other cases, I may need to make it failed test. So, here comes my question.
Q1: What is the normally used method for comparing two values (including very complicated data types) are the same? I may compare the toString() results, but I don't think this is a good idea.
Q2: Is it a general rule that mutable data structure has the same hashCode with immutable counterpart?
You are looking for AnyRef.eq which does reference equality (which is as close as you can get to Python's id function and is identical if you just want to compare references and you don't care about the actual ID):
scala> x == y
true
scala> x eq y
false

Scala HashMap: doesn't += reassign to left hand side?

I am reading Seven Languages in Seven Weeks to get the taste of different programming paradigm. In the chapter about Scala, I found out that collection are immutable (at least the one from scala.collection.immutable).
However, there is an example which confuses me:
scala> val hashMap = HashMap(0->0)
scala> hashMap += 1->1
scala> hashMap
res42: scala.collection.mutable.HashMap[Int,Int] = Map(1 -> 1, 0 -> 0)
but
scala> map = map + 2->2
<console>:9: error: reassignment to val
map = map + 2->2
Is it += reassigning an immutable collection? How is that that += can reassign a val HashMap while = fails?
Moreover, I tried out with other collections (List and Map) and with "primitive" (Int) and += fails with the reassignment error. How are HashMaps special? I do not read anything particular in the Scala API and I cannot find a definition for += operator (I am assuming it to be an operator and not a function even in Scala, as well as in C++ or Java).
Sorry for the dumb question, but since I am new to Scala I am having difficulties finding resources by myself.
You're right that this works with a var, where the compiler can take
hashMap += 1->1
and desugar it to
hashMap = hashMap + 1->1
But there's another possibility too. If your hashMap is of the type scala.collection.mutable.hashMap then it directly calls the += method defined on that type:
hashMap.+=(1->1)
No val is reassigned, the map just internally mutates itself.
hashMap in code sample is collection.mutable.HashMap[Int,Int]. Note the mutable package.
There is a mutable version for many scala collections.
And there is method with name += in mutable.HashMap.
There are two types of collections in Scala: Mutable and Immutable
Mutable: http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.mutable.package
Immutable: http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.immutable.package
What you used definitely belongs to mutable category and hence can be reassigned. In fact, if you try this:
val hashMap = scala.collection.immutable.HashMap(0->0)
hashMap += (1->1)
you'll get same error
Unlike some other languages, x += y doesn't always compile into x = x + y in Scala. hashMap += 1->1 is actually an infix method call (+= is a valid method name in Scala), which is defined in mutable.HashMap class.
Your first example uses mutable HashMap, as you can see in the last line. Immutable HashMap doesn't have += method.
Note that for a mutable HashMap
scala> val map = scala.collection.mutable.HashMap[Int,Int](0 -> 0)
map: scala.collection.mutable.HashMap[Int,Int] = Map(0 -> 0)
its contents can be changed without using val or var,
scala> map += 1->1
res1: map.type = Map(1 -> 1, 0 -> 0)
scala> map += 2->2
res2: map.type = Map(2 -> 2, 1 -> 1, 0 -> 0)
scala> map
res3: scala.collection.mutable.HashMap[Int,Int] = Map(2 -> 2, 1 -> 1, 0 -> 0)
However for an immutable HashMap declared with val
scala> val imap = scala.collection.immutable.HashMap[Int,Int](0 -> 0)
imap: scala.collection.immutable.HashMap[Int,Int] = Map(0 -> 0)
we cannot for instance add new pairs,
scala> imap += 1->1
<console>:10: error: value += is not a member of scala.collection.immutable.HashMap[Int,Int]
imap += 1->1
^
However we can create a new HashMap from the original and add a new pair,
scala> val imap2 = imap.updated(1,1)
imap2: scala.collection.immutable.HashMap[Int,Int] = Map(0 -> 0, 1 -> 1)
Even so, an immutable HashMap declared with var
scala> var imap = scala.collection.immutable.HashMap[Int,Int](0 -> 0)
imap: scala.collection.immutable.HashMap[Int,Int] = Map(0 -> 0)
allows for updating the contents,
scala> imap += 1->1
scala> imap
res11: scala.collection.immutable.HashMap[Int,Int] = Map(0 -> 0, 1 -> 1)

Adding objects to a map keyed on a member of these objects

I'm trying to find a (functional) way to add a collection of objects into a map that is keyed on a member of these objects.
Let's say I have the following objects (they're all instances of the same class O):
o1(a = 1, b = x)
o2(a = 1, b = y)
o3(a = 2, b = z)
I want to generate a Map keyed on member a that contains the following tuples:
(1, List(o1, o2))
(2, List(o3))
Now I could obviously do it iteratively, going through each object in my initial list and adding them as I go along. But I feel I am missing a functional way of doing that easily. I've been struggling with maps, flatMaps and filters to try to achieve that, no result so far.
groupBy is what you want:
scala> val os = List(O(1,2), O(1,3), O(2,4))
os: List[O] = List(O(1,2), O(1,3), O(2,4))
scala> os.groupBy(_.a)
res3: scala.collection.immutable.Map[Int,List[O]] = Map(1 -> List(O(1,2), O(1,3)), 2 -> List(O(2,4)))

How to convert Map[String,Seq[String]] to Map[String,String]

I have a Map[String,Seq[String]] and want to basically covert it to a Map[String,String] since I know the sequence will only have one value.
Someone else already mentioned mapValues, but if I were you I would do it like this:
scala> val m = Map(1 -> Seq(1), 2 -> Seq(2))
m: scala.collection.immutable.Map[Int,Seq[Int]] = Map(1 -> List(1), 2 -> List(2))
scala> m.map { case (k,Seq(v)) => (k,v) }
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2)
Two reasons:
The mapValues method produces a view of the result Map, meaning that the function will be recomputed every time you access an element. Unless you plan on accessing each element exactly once, or you only plan on accessing a very small percentage of them, you don't want that recomputation to take place.
Using a case with (k,Seq(v)) ensures that an exception will be thrown if the function ever sees a Seq that doesn't contain exactly one element. Using _(0) or _.head will throw an exception if there are zero elements, but will not complain if you had more than one, which will likely result in mysterious bugs later on when things go missing without errors.
You can use mapValues().
scala> Map("a" -> Seq("aaa"), "b" -> Seq("bbb"))
res0: scala.collection.immutable.Map[java.lang.String,Seq[java.lang.String]] = M
ap(a -> List(aaa), b -> List(bbb))
scala> res0.mapValues(_(0))
res1: scala.collection.immutable.Map[java.lang.String,java.lang.String] = Map(a
-> aaa, b -> bbb)
I think I got it by doing the following:
mymap.flatMap(x => Map(x._1 -> x._2.head))
Yet another suggestion:
m mapValues { _.mkString }
This one's agnostic to whether the Seq has multiple elements -- it'll just concatenate all the strings together. If you're concerned about the recomputation of each value, you can make it happen up-front:
(m mapValues { _.mkString }).view.force

Nested Default Maps in Scala

I'm trying to construct nested maps in Scala, where both the outer and inner map use the "withDefaultValue" method. For example, the following :
val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m(1)(2)
res: Int = 3
m(1)(2) = 5
m(1)(2)
res: Int = 5
m(2)(3) = 6
m
res : scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
So the map, when addressed by the appropriate keys, gives me back what I put in. However, the map itself appears empty! Even m.size returns 0 in this example. Can anyone explain what's going on here?
Short answer
It's definitely not a bug.
Long answer
The behavior of withDefaultValue is to store a default value (in your case, a mutable map) inside the Map to be returned in the case that they key does not exist. This is not the same as a value that is inserted into the Map when they key is not found.
Let's look closely at what's happening. It will be easier to understand if we pull the default map out as a separate variable so we can inspect is at will; let's call it default
import collection.mutable.HashMap
val default = HashMap.empty[Int,Int].withDefaultValue(3)
So default is a mutable map (that has its own default value). Now we can create m and give default as the default value.
import collection.mutable.{Map => MMap}
val m = HashMap.empty[Int, MMap[Int,Int]].withDefaultValue(default)
Now whenever m is accessed with a missing key, it will return default. Notice that this is the exact same behavior as you have because withDefaultValue is defined as:
def withDefaultValue (d: B): Map[A, B]
Notice that it's d: B and not d: => B, so it will not create a new map each time the default is accessed; it will return the same exact object, what we've called default.
So let's see what happens:
m(1) // Map()
Since key 1 is not in m, the default, default is returned. default at this time is an empty Map.
m(1)(2) = 5
Since m(1) returns default, this operation stores 5 as the value for key 2 in default. Nothing is written to the Map m because m(1) resolves to default which is a separate Map entirely. We can check this by viewing default:
default // Map(2 -> 5)
But as we said, m is left unchanged
m // Map()
Now, how to achieve what you really wanted? Instead of using withDefaultValue, you want to make use of getOrElseUpdate:
def getOrElseUpdate (key: A, op: ⇒ B): B
Notice how we see op: => B? This means that the argument op will be re-evaluated each time it is needed. This allows us to put a new Map in there and have it be a separate new Map for each invalid key. Let's take a look:
val m2 = HashMap.empty[Int, MMap[Int,Int]]
No default values needed here.
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(3)) // Map()
Key 1 doesn't exist, so we insert a new HashMap, and return that new value. We can check that it was inserted as we expected. Notice that 1 maps to the newly added empty map and that they 3 was not added anywhere because of the behavior explained above.
m2 // Map(1 -> Map())
Likewise, we can update the Map as expected:
m2.getOrElseUpdate(1, HashMap.empty[Int,Int].withDefaultValue(1))(2) = 6
and check that it was added:
m2 // Map(1 -> Map(2 -> 6))
withDefaultValue is used to return a value when the key was not found. It does not populate the map. So you map stays empty. Somewhat like using getOrElse(a, b) where b is provided by withDefaultValue.
I just had the exact same problem, and was happy to find dhg's answer. Since typing getOrElseUpdate all the time is not very concise, I came up with this little extension of the idea that I want to share:
You can declare a class that uses getOrElseUpdate as default behavior for the () operator:
class DefaultDict[K, V](defaultFunction: (K) => V) extends HashMap[K, V] {
override def default(key: K): V = return defaultFunction(key)
override def apply(key: K): V =
getOrElseUpdate(key, default(key))
}
Now you can do what you want to do like this:
var map = new DefaultDict[Int, DefaultDict[Int, Int]](
key => new DefaultDict(key => 3))
map(1)(2) = 5
Which does now result in map containing 5 (or rather: containing a DefaultDict containing the value 5 for the key 2).
What you're seeing is the effect that you've created a single Map[Int, Int] this is the default value whenever the key isn't in the outer map.
scala> val m = HashMap.empty[Int, collection.mutable.Map[Int,Int]].withDefaultValue( HashMap.empty[Int,Int].withDefaultValue(3))
m: scala.collection.mutable.Map[Int,scala.collection.mutable.Map[Int,Int]] = Map()
scala> m(2)(2)
res1: Int = 3
scala> m(1)(2) = 5
scala> m(2)(2)
res2: Int = 5
To get the effect that you're looking for, you'll have to wrap the Map with an implementation that actually inserts the default value when a key isn't found in the Map.
Edit:
I'm not sure what your actual use case is, but you may have an easier time using a pair for the key to a single Map.
scala> val m = HashMap.empty[(Int, Int), Int].withDefaultValue(3)
m: scala.collection.mutable.Map[(Int, Int),Int] = Map()
scala> m((1, 2))
res0: Int = 3
scala> m((1, 2)) = 5
scala> m((1, 2))
res3: Int = 5
scala> m
res4: scala.collection.mutable.Map[(Int, Int),Int] = Map((1,2) -> 5)
I know it's a bit late but I've just seen the post while I was trying to solve the same problem.
Probably the API are different from the 2012 version but you may want to use withDefaultinstead that withDefaultValue.
The difference is that withDefault takes a function as parameter, that is executed every time a missed key is requested ;)