Adding objects to a map keyed on a member of these objects - scala

I'm trying to find a (functional) way to add a collection of objects into a map that is keyed on a member of these objects.
Let's say I have the following objects (they're all instances of the same class O):
o1(a = 1, b = x)
o2(a = 1, b = y)
o3(a = 2, b = z)
I want to generate a Map keyed on member a that contains the following tuples:
(1, List(o1, o2))
(2, List(o3))
Now I could obviously do it iteratively, going through each object in my initial list and adding them as I go along. But I feel I am missing a functional way of doing that easily. I've been struggling with maps, flatMaps and filters to try to achieve that, no result so far.

groupBy is what you want:
scala> val os = List(O(1,2), O(1,3), O(2,4))
os: List[O] = List(O(1,2), O(1,3), O(2,4))
scala> os.groupBy(_.a)
res3: scala.collection.immutable.Map[Int,List[O]] = Map(1 -> List(O(1,2), O(1,3)), 2 -> List(O(2,4)))

Related

Checking sameness/equality in Scala

As I asked in other post (Unique id for Scala object), it doesn't seem like that I can have id just like Python.
I still need to check the sameness in Scala for unittest. I run a test and compare the returned value of some nested collection object (i.e., List[Map[Int, ...]]) with the one that I create.
However, the hashCode for mutable map is the same as that of immutable map. As a result (x == y) returns True.
scala> val x = Map("a" -> 10, "b" -> 20)
x: scala.collection.immutable.Map[String,Int] = Map(a -> 10, b -> 20)
scala> x.hashCode
res0: Int = -1001662700
scala> val y = collection.mutable.Map("b" -> 20, "a" -> 10)
y: scala.collection.mutable.Map[String,Int] = Map(b -> 20, a -> 10)
scala> y.hashCode
res2: Int = -1001662700
In some cases, it's OK, but in other cases, I may need to make it failed test. So, here comes my question.
Q1: What is the normally used method for comparing two values (including very complicated data types) are the same? I may compare the toString() results, but I don't think this is a good idea.
Q2: Is it a general rule that mutable data structure has the same hashCode with immutable counterpart?
You are looking for AnyRef.eq which does reference equality (which is as close as you can get to Python's id function and is identical if you just want to compare references and you don't care about the actual ID):
scala> x == y
true
scala> x eq y
false

When does Scala actually copy objects?

Background
I have a chunk of code that looks like this:
val big_obj = new BigObj
big_obj.recs[5].foo()
... // other code
big_obj.recs[7].bar()
Problem
I want to do something like this
val big_obj = new BigObj
alias ref = big_obj.recs // looking for something like an alias
ref[5].foo()
... // other code
ref[7].bar()
because I am afraid of making copies of big objects (coming from C++). But then I realised that Scala is probably smart and if I simply do this:
val big_obj = new BigObj
val ref = big_obj.recs // no copies made?
the compiler is probably smart enough to not copy anyways, since it's all read-only.
Question
This got me wondering about Scala's memory model.
Under what situations will copies be made/not made?
I am looking for a simple answer or rule-of-thumb which I can keep in my mind when I deal with really_big_objects, whenever I make assignments, including passing arguments.
Just like Java (and python and probably a lot of other languages), copies are never made of objects. When you assign an object or pass it as an argument, it only copies the reference to the object; the actual object just sits in memory and has an extra thing pointing to it. The only things that would get copied are primitives (integers, doubles, etc).
As you pointed out, this is obviously good for immutable objects, but it's true of all objects, even mutable ones:
scala> val a = collection.mutable.Map(1 -> 2)
a: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> val b = a
b: scala.collection.mutable.Map[Int,Int] = Map(1 -> 2)
scala> b += (2 -> 4)
res41: b.type = Map(2 -> 4, 1 -> 2)
scala> a
res42: scala.collection.mutable.Map[Int,Int] = Map(2 -> 4, 1 -> 2)
scala> def addTo(m: collection.mutable.Map[Int,Int]) { m += (3 -> 9) }
addTo: (m: scala.collection.mutable.Map[Int,Int])Unit
scala> addTo(b)
scala> a
res44: scala.collection.mutable.Map[Int,Int] = Map(2 -> 4, 1 -> 2, 3 -> 9)

immutable val vs mutable ArrayBuffer

mutable vs. immutable in Scala collections
Before I post this question, I have read the above article. Apparently if you store something in val, you can't modify it, but then if you store a mutable collection such as ArrayBuffer, you can modify it!
scala> val b = ArrayBuffer[Int](1,2,3)
b: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> b += 1
res50: b.type = ArrayBuffer(1, 2, 3, 1)
scala> b
res51: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 1)
What is the use of using val to store a mutable ArrayBuffer? I assume the only reason b changes is because val b holds the memory address to that ArrayBuffer(1,2,3).
If you try var x = 1; val y = x; x = 5; y, the output will still be 1. In this case, y stores an actual value instead of the address to x.
Java doesn't have this confusion because it's clear an Object can't be assigned to an int variable .
How do I know when is the variable in scala carrying a value, when is a memory address? What's the point of storing a mutable collection in a immutable variable?
A simple answer is that vals and vars are all references. There're no primitive types in Scala. They're all objects.
val x = 1
is a reference named x that points to an immutable integer object 1. You cannot do 1.changeTo(2) or something, so if you have
val value = 5
val x = value
var y = value
You can do y += 10 This changes y to reference a new object, (5 + 10) = 15. The original 5 remains 5.
On the other hand, you cannot do x += 10 because x is a val which means it must always point to 5. So, this doesn't compile.
You may wonder why you can do val b = ArrayBuffer(...) and then b += something even though b is a val. That's because += is actually a method, not an assignment. Calling b += something gets translated to b.+=(something). The method += just adds a new element (something) to its mutable self and returns itself for further assignment.
Let's see an example
scala> val xs = ArrayBuffer(1,2,3)
xs: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> val ys = ( xs += 999 )
ys: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs
res0: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 999)
scala> ys
res1: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs eq ys
res2: Boolean = true
This confirms xs and ys point to the same (mutable) ArrayBuffer. The eq method is like Java's ==, which compares object identity. Mutable/Immutable references (val/var) and mutable/immutable data structures (ArrayBuffer, List) are different. So, if you do another xs += 888, the ys which is an immutable reference pointing to a mutable data structure also contains 888.
What's the point of storing a mutable collection in a immutable variable
val a = new ArrayBuffer(1)
a = new ArrayBuffer[Int]()
<console>:9: error: reassignment to val
It prevents the variable from being assigned to a new memory address. In practice though scala encourages you not to use mutable state (to avoid locking, blocking, etc), so I'm having trouble coming up with an example for a real situation where the choice of var or val for mutable state matters.
Immutable object and constant value are two different things.
If you define your collection as val means that the referenced instance of the collection will always be the same. But this instance can be mutable or immutable: if it is immutable you cannot add or remove items in that instance, vice versa if it is mutable you can do it. When a collection is immutable to add or remove items you always create a copy.
How do I know when is the variable in scala carrying a value, when is a memory address?
Scala always runs on the JVM (.NET support was discontinued), so types that are primitive types on JVM will be treated as primitive types by Scala.
What is the use of using val to store a mutable ArrayBuffer?
The closest alternative would be to use a var to store an immutable Seq. If that Seq was very large, you wouldn't want to have to copy the whole Seq every time you made a change to it - but that's what you might have to do! That would be very slow!

Representing a graph (adjacency list) with HashMap[Int, Vector[Int]] (Scala)?

I was wondering how (if possible) I can go about making an adjacency list representation of a (mutable) graph via HashMap[Int, Vector[Int]]. HashMap would be mutable of course.
Currently I have it set as HashMap[Int, ArrayBuffer[Int]], but the fact that I can change each cell in the ArrayBuffer makes me uncomfortable, even though I'm fairly certain I'm not doing that. I would use a ListBuffer[Int] but I would like fast random access to neighbors due to my need to do fast random walks on the graphs. A Vector[Int] would solve this problem, but is there anyway to do this?
To my knowledge (tried this in the REPL), this won't work:
scala> val x = new mutable.HashMap[Int, Vector[Int]]
x: scala.collection.mutable.HashMap[Int,Vector[Int]] = Map()
scala> x(3) = Vector(1)
scala> x(3) += 4 // DOES NOT WORK
I need to be able to both append to it at any time and also access any element within it randomly (given the index). Is this possible?
Thanks!
-kstruct
Using the Vector:
x += 3 -> (x(3) :+ 4) //x.type = Map(3 -> Vector(1, 4))
You might notice that this will fail if there's no existing key, so you might like to set up your map as
val x = new mutable.HashMap[Int, Vector[Int]] withDefaultValue Vector.empty

Scala Tuple Deconstruction

I am new to Scala, and ran across a small hiccup that has been annoying me.
Initializing two vars in parallel works great: var (x,y) = (1,2)
However I can't find a way to assign new values in parallel: (x,y) = (x+y,y-x) //invalid syntax
I end up writing something like this: val xtmp = x+y; y = x-y; x = xtmp
I realize writing functional code is one way of avoiding this, but there are certain situations where vars just make more sense.
I have two questions:
1) Is there a better way of doing this? Am I missing something?
2) What is the reason for not allowing true parallel assignment?
Unfortunately, you cannot do multiple assignments in Scala. But you may use tuples, if they fit your problem:
scala> var xy = (1,2)
xy: (Int, Int) = (1,2)
scala> xy = (xy._1 + xy._2, xy._2 - xy._1)
xy: (Int, Int) = (3,1)
This way, xy is one tuple with two values. The first value can be accessed using xy._1, the second one using xy._2.
Scala has 2 types of variables: vals and vars. Vals are similar to Java's final variables, so as far as I understand from what you're asking, the only way to assign new values in parallel to vals is by:
scala> val (x, y) = (1, 2);
or
scala> val s = (3, 4);
s: (Int, Int) = (3,4)
scala> s._1
res1: Int = 3
scala> s._2
res2: Int = 4