Scala Sets Immutability - scala

scala> var immSet = Set("A", "B")
immSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> immSet += "C"
scala> println(immSet)
Set(A, B, C)
I wonder, what is the advantage I am getting by allowing var to be used with with an immutable Set? Am I not losing immutability in this case?

What is the advantage I am getting by allowing var to be used with
with a Immutabable Sets?
I would say this can mainly cause confusion. The fact that you're using a var allows you to overwrite the variable, but, the Set by itself doesn't change, it allocates a new set with the additional value "C". But since you're using a var, the previous Set is now no longer referenced, unless you've referenced is somewhere else higher up the stack:
scala> var firstSet = Set("A", "B")
firstSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> var secondSet = firstSet
secondSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> firstSet += "C"
scala> firstSet
res6: scala.collection.immutable.Set[String] = Set(A, B, C)
scala> secondSet
res7: scala.collection.immutable.Set[String] = Set(A, B)
Because secondSet still points to the Set created by firstSet, we don't see the value update reflected. I think making this immutable adds clarity that the underlying Set is immutable and well as the variable pointing to it. When you use a val, the compiler will yell if you attempt to reassign, forcing you to realize that a new collection is initialized.
Regarding immutability, we need to divide this into two. There is the immutability of the Set, and there is the immutability of the variable pointing to that Set, these are two different things. You lose the latter with this approach.

Read #YuvalItzchakov's answer if you want to understand what is the basic difference between immutable collection defined as var and mutable collection defined as val. I'll concentrate on practical aspects of both approaches.
First of all, both approaches imply mutability. If you want to stay "pure functional" you should avoid either of them.
Now, if you want mutable collection, what is the best way? Short answer, it depends.
Performance. Mutable collections are usually faster than their immutable counterparts. It means, that if your mutable variable is somehow contained (for example, doesn't escape private method), it may be better to use val c = MutableCollection(). Most of the Scala's methods in Collections API internally use mutable collections.
Thread safety. Value of immutable collection is always thread safe. You can send it to another thread and don't think about visibility and concurrent changes.
var a = ImmutableCol()
otherThreadProcessor.process(a)
a += 1 // otherThread will still have previous value
On the other hand, if you want to modify collection from multiple threads, better use Java's concurrent collection API.
Code clarity. Imagine you have some function that takes collection as an argument and then modifies it in some ways. If collection is mutable, then, after function returns, collection, passed as an argument, will stay modified.
def recImmutable(a:Set[Int]): Unit = {
var b = a
b += 4
}
val a = Set(2,3)
recImmutable(a)
println(a)
// prints Set(2, 3)
def recMutable(a:mutable.Set[Int]): Unit = {
var b = a
b += 4
}
val b = mutable.Set(2,3)
recMutable(b)
println(b)
// prints Set(2, 3, 4)

Related

Scala: Update Array inside a Map

I am creating a Map which has an Array inside it. I need to keep adding values to that Array. How do I do that?
var values: Map[String, Array[Float]] = Map()
I tried several ways such as:
myobject.values.getOrElse("key1", Array()).++(Array(float1))
Few other ways to but nothing updates the array inside the Map.
There is a problem with this code:
values.getOrElse("key1", Array()).++(Array(float1))
This does not update the Map in values, it just creates a new Array and then throws it away.
You need to replace the original Map with a new, updated Map, like this:
values = values.updated("key1", values.getOrElse("key1", Array.empty[Float]) :+ float1)
To understand this you need to be clear on the distinction between mutable variables and mutable data.
var is used to create a mutable variable which means that the variable can be assigned a new value, e.g.
var name = "John"
name = "Peter" // Would not work if name was a val
By contrast mutable data is held in objects whose contents can be changed
val a = Array(1,2,3)
a(0) = 12 // Works even though a is a val not a var
In your example values is a mutable variable but the Map is immutable so it can't be changed. You have to create a new, immutable, Map and assign it to the mutable var.
From what I can see (according to ++), you would like to append Array, with one more element. But Array fixed length structure, so instead I'd recommend to use Vector. Because, I suppose, you are using immutable Map you need update it as well.
So the final solution might look like:
var values: Map[String, Vector[Float]] = Map()
val key = "key1"
val value = 1.0
values = values + (key -> (values.getOrElse(key, Vector.empty[Float]) :+ value))
Hope this helps!
You can use Scala 2.13's transform function to transform your map anyway you want.
val values = Map("key" -> Array(1f, 2f, 3f), "key2" -> Array(4f,5f,6f))
values.transform {
case ("key", v) => v ++ Array(6f)
case (_,v) => v
}
Result:
Map(key -> Array(1.0, 2.0, 3.0, 6.0), key2 -> Array(4.0, 5.0, 6.0))
Note that appending to arrays takes linear time so you might want to consider a more efficient data structure such as Vector or Queue or even a List (if you can afford to prepend rather than append).
Update:
However, if it is only one key you want to update, it is probably better to use updatedWith:
values.updatedWith("key")(_.map(_ ++ Array(6f)))
which will give the same result. The nice thing about the above code is that if the key does not exist, it will not change the map at all without throwing any error.
Immutable vs Mutable Collections
You need to choose what type of collection you will use immutable or mutable one. Both are great and works totally differently. I guess you are familiar with mutable one (from other languages), but immutable are default in scala and probably you are using it in your code (because it doesn't need any imports). Immutable Map cannot be changed... you can only create new one with updated values (Tim's and Ivan's answers covers that).
There are few ways to solve your problem and all are good depending on use case.
See implementation below (m1 to m6):
//just for convenience
type T = String
type E = Long
import scala.collection._
//immutable map with immutable seq (default).
var m1 = immutable.Map.empty[T,List[E]]
//mutable map with immutable seq. This is great for most use-cases.
val m2 = mutable.Map.empty[T,List[E]]
//mutable concurrent map with immutable seq.
//should be fast and threadsafe (if you know how to deal with it)
val m3 = collection.concurrent.TrieMap.empty[T,List[E]]
//mutable map with mutable seq.
//should be fast but could be unsafe. This is default in most imperative languages (PHP/JS/JAVA and more).
//Probably this is what You have tried to do
val m4 = mutable.Map.empty[T,mutable.ArrayBuffer[E]]
//immutable map with mutable seq.
//still could be unsafe
val m5 = immutable.Map.empty[T,mutable.ArrayBuffer[E]]
//immutable map with mutable seq v2 (used in next snipped)
var m6 = immutable.Map.empty[T,mutable.ArrayBuffer[E]]
//Oh... and NEVER DO THAT, this is wrong
//I mean... don't keep mutable Map in `var`
//var mX = mutable.Map.empty[T,...]
Other answers show immutable.Map with immutable.Seq and this is preferred way (or default at least). It costs something but for most apps it is perfectly ok. Here You have nice source of info about immutable data structures: https://stanch.github.io/reftree/talks/Immutability.html.
Each variant has it's own Pros and Cons. Each deals with updates differently, and it makes this question much harder than it looks at the first glance.
Solutions
val k = "The Ultimate Answer"
val v = 42f
//immutable map with immutable seq (default).
m1 = m1.updated(k, v :: m1.getOrElse(k, Nil))
//mutable map with immutable seq.
m2.update(k, v :: m2.getOrElse(k, Nil))
//mutable concurrent map with immutable seq.
//m3 is bit harder to do in scala 2.12... sorry :)
//mutable map with mutable seq.
m4.getOrElseUpdate(k, mutable.ArrayBuffer.empty[Float]) += v
//immutable map with mutable seq.
m5 = m5.updated(k, {
val col = m5.getOrElse(k, c.mutable.ArrayBuffer.empty[E])
col += v
col
})
//or another implementation of immutable map with mutable seq.
m6.get(k) match {
case None => m6 = m6.updated(k, c.mutable.ArrayBuffer(v))
case Some(col) => col += v
}
check scalafiddle with this implementations. https://scalafiddle.io/sf/WFBB24j/3.
This is great tool (ps: you can always save CTRL+S your changes and share link to write question about your snippet).
Oh... and if You care about concurrency (m3 case) then write another question. Such topic deserve to be in separate thread :)
(im)mutable api VS (im)mutable Collections
You can have mutable collection and still use immutable api that will copy orginal seq. For example Array is mutable:
val example = Array(1,2,3)
example(0) = 33 //edit in place
println(example.mkString(", ")) //33, 2, 3
But some functions on it (e.g. ++) will create new sequence... not change existing one:
val example2 = example ++ Array(42, 41) //++ is immutable operator
println(example.mkString(", ")) //33, 2, 3 //example stays unchanged
println(example2.mkString(", ")) //33, 2, 3, 42, 41 //but new sequence is created
There is method updateWith that is mutable and will exist only in mutable sequences. There is also updatedWith and it exists in both immutable AND mutable collections and if you are not careful enough you will use wrong one (yea ... 1 letter more).
This means you need to be careful which functions you are using, immutable or mutable one. Most of the time you can distinct them by result type. If something returns collection then it will be probably some kind of copy of original seq. It result is unit then it is mutable for sure.

Mutable Set += return value in Scala

As I understand it, the point of having a += method on mutable Sets, is that
val s = collection.mutable.Set(1)
s += 2 // s is now a mutable Set(1, 2)
has an analogous effect to
var s = Set(1) // immutable
s += 2 // s is now an immutable Set(1, 2)
If that's so, why does the += method on the mutable Set return the Set itself? Wouldn't that make code more difficult to refactor e.g.
val s = collection.mutable.Set(1)
val s1 = s += 2 // s and s1 are now mutable Set(1, 2)
can't be refactored to
var s = Set(1) // immutable
var s1 = s += 2 // s is immutable Set(1, 2), s1 is now ()
while maintaining the original meaning. What's the reason behind this design decision?
(This is obviously just a guess)
Mutable Scala collections were intended to be used as Builders for immutable collections. There is one very prominent and very ancient immutable data structure on the JVM: the String. The corresponding builder (not tied to Scala in any way) existed already in Java: it was the StringBuilder. If you look into documentation, you will see dozens of versions of an overloaded method append. Every time, it returns the StringBuilder itself, which allows you to write the following:
// java code
myBuilder
.append('h')
.append('i');
I guess that the Scala collection.mutable API simply imitated the behavior of Java's StringBuilder, but replaced the append(...) by a somewhat shorter +=. In the end of the day, it's just an implementation of the classic builder pattern.
In the immutable case, s += 2 is the same as s = s + 2. So if you make s += 2 evaluate to the new value of s, then you kind of have to make every assignment statement evaluate to the result of the assignment. Other languages do it this way, but it has historically led to bugs, famously with C code like:
if (x = 0) {
...
}
So I think it makes sense not to have that return the set.
On the other hand, for the mutable case, += is just a method name, so it doesn't do assignment and can't really be responsible for this kind of bug in a meaningful way. And having it return itself enables the kind of builder pattern chaining that is sometimes useful.

What's the difference between a set and a mapping to boolean?

In Scala, I sometimes use a Map[A, Boolean], sometimes a Set[A]. There's really not much difference between these two concepts, and an implementation might use the same data structure to implement them. So why bother to have Sets? As I said, this question occurred to me in connection with Scala, but it would arise in any programming language whose library implements a Set abstraction.
The are some specific convenient methods defined on Set (intersect, diff and more). Not a big deal, but often useful.
My first thoughts are two:
efficiency: if you only want to signal presence, why bothering with a flag that can either be true or false?
meaning: a set is about the existence of something, a map is about a correlation between a key and value (generally speaking); these two ideas are quite different and should be used accordingly to simplify reading and understanding the code
Furthermore, the semantics of application change:
val map: Map[String, Bool] = Map("hello" -> true, "world" -> false)
val set: Set[String] = Set("hello")
map("hello") // true
map("world") // false
map("none!") // EXCEPTION
set("hello") // true
set("world") // false
set("none!") // false
Without having to actually store an extra pair to indicate absence (not to mention the boolean that actually indicates such absence).
Sets are very good to indicate the presence of something, which makes them very good for filtering:
val map = (0 until 10).map(_.toString).zipWithIndex.toMap
val set = (3 to 5).map(_.toString).toSet
map.filterKeys(set) // keeps pairs 3rd through 5th
Maps, in terms of processing collections, are good to indicate relationships, which makes them very good for collecting:
set.collect(map) // more or less equivalent as above, but only values are returned
You can read more about using collections as functions to process other collections here.
There are several reasons:
1) It is easier to think/work with a data structure that only has single elements as opposed to mapping to dummy true,
For example, it is easier to convert a list to Set, then to Map:
scala> val myList = List(1,2,3,2,1)
myList: List[Int] = List(1, 2, 3, 2, 1)
scala> myList.toSet
res9: scala.collection.immutable.Set[Int] = Set(1, 2, 3)
scala> myList.map(x => (x, true)).toMap
res1: scala.collection.immutable.Map[Int,Boolean] = Map(1 -> true, 2 -> true, 3 -> true)
2) As Kombajn zbożowy pointed out, Sets have additional helper methods, union, intersect, diff, subsetOf.
3) Sense Set does not have mapping to dummy variable the size of a set in memory is smaller, this is more noticeable for small sized keys.
Having said the above, not all languages have Set data structure, Go for example does not.

immutable val vs mutable ArrayBuffer

mutable vs. immutable in Scala collections
Before I post this question, I have read the above article. Apparently if you store something in val, you can't modify it, but then if you store a mutable collection such as ArrayBuffer, you can modify it!
scala> val b = ArrayBuffer[Int](1,2,3)
b: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> b += 1
res50: b.type = ArrayBuffer(1, 2, 3, 1)
scala> b
res51: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 1)
What is the use of using val to store a mutable ArrayBuffer? I assume the only reason b changes is because val b holds the memory address to that ArrayBuffer(1,2,3).
If you try var x = 1; val y = x; x = 5; y, the output will still be 1. In this case, y stores an actual value instead of the address to x.
Java doesn't have this confusion because it's clear an Object can't be assigned to an int variable .
How do I know when is the variable in scala carrying a value, when is a memory address? What's the point of storing a mutable collection in a immutable variable?
A simple answer is that vals and vars are all references. There're no primitive types in Scala. They're all objects.
val x = 1
is a reference named x that points to an immutable integer object 1. You cannot do 1.changeTo(2) or something, so if you have
val value = 5
val x = value
var y = value
You can do y += 10 This changes y to reference a new object, (5 + 10) = 15. The original 5 remains 5.
On the other hand, you cannot do x += 10 because x is a val which means it must always point to 5. So, this doesn't compile.
You may wonder why you can do val b = ArrayBuffer(...) and then b += something even though b is a val. That's because += is actually a method, not an assignment. Calling b += something gets translated to b.+=(something). The method += just adds a new element (something) to its mutable self and returns itself for further assignment.
Let's see an example
scala> val xs = ArrayBuffer(1,2,3)
xs: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> val ys = ( xs += 999 )
ys: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs
res0: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 999)
scala> ys
res1: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs eq ys
res2: Boolean = true
This confirms xs and ys point to the same (mutable) ArrayBuffer. The eq method is like Java's ==, which compares object identity. Mutable/Immutable references (val/var) and mutable/immutable data structures (ArrayBuffer, List) are different. So, if you do another xs += 888, the ys which is an immutable reference pointing to a mutable data structure also contains 888.
What's the point of storing a mutable collection in a immutable variable
val a = new ArrayBuffer(1)
a = new ArrayBuffer[Int]()
<console>:9: error: reassignment to val
It prevents the variable from being assigned to a new memory address. In practice though scala encourages you not to use mutable state (to avoid locking, blocking, etc), so I'm having trouble coming up with an example for a real situation where the choice of var or val for mutable state matters.
Immutable object and constant value are two different things.
If you define your collection as val means that the referenced instance of the collection will always be the same. But this instance can be mutable or immutable: if it is immutable you cannot add or remove items in that instance, vice versa if it is mutable you can do it. When a collection is immutable to add or remove items you always create a copy.
How do I know when is the variable in scala carrying a value, when is a memory address?
Scala always runs on the JVM (.NET support was discontinued), so types that are primitive types on JVM will be treated as primitive types by Scala.
What is the use of using val to store a mutable ArrayBuffer?
The closest alternative would be to use a var to store an immutable Seq. If that Seq was very large, you wouldn't want to have to copy the whole Seq every time you made a change to it - but that's what you might have to do! That would be very slow!

Should Scala's map() behave differently when mapping to the same type?

In the Scala Collections framework, I think there are some behaviors that are counterintuitive when using map().
We can distinguish two kinds of transformations on (immutable) collections. Those whose implementation calls newBuilder to recreate the resulting collection, and those who go though an implicit CanBuildFrom to obtain the builder.
The first category contains all transformations where the type of the contained elements does not change. They are, for example, filter, partition, drop, take, span, etc. These transformations are free to call newBuilder and to recreate the same collection type as the one they are called on, no matter how specific: filtering a List[Int] can always return a List[Int]; filtering a BitSet (or the RNA example structure described in this article on the architecture of the collections framework) can always return another BitSet (or RNA). Let's call them the filtering transformations.
The second category of transformations need CanBuildFroms to be more flexible, as the type of the contained elements may change, and as a result of this, the type of the collection itself maybe cannot be reused: a BitSet cannot contain Strings; an RNA contains only Bases. Examples of such transformations are map, flatMap, collect, scanLeft, ++, etc. Let's call them the mapping transformations.
Now here's the main issue to discuss. No matter what the static type of the collection is, all filtering transformations will return the same collection type, while the collection type returned by a mapping operation can vary depending on the static type.
scala> import collection.immutable.TreeSet
import collection.immutable.TreeSet
scala> val treeset = TreeSet(1,2,3,4,5) // static type == dynamic type
treeset: scala.collection.immutable.TreeSet[Int] = TreeSet(1, 2, 3, 4, 5)
scala> val set: Set[Int] = TreeSet(1,2,3,4,5) // static type != dynamic type
set: Set[Int] = TreeSet(1, 2, 3, 4, 5)
scala> treeset.filter(_ % 2 == 0)
res0: scala.collection.immutable.TreeSet[Int] = TreeSet(2, 4) // fine, a TreeSet again
scala> set.filter(_ % 2 == 0)
res1: scala.collection.immutable.Set[Int] = TreeSet(2, 4) // fine
scala> treeset.map(_ + 1)
res2: scala.collection.immutable.SortedSet[Int] = TreeSet(2, 3, 4, 5, 6) // still fine
scala> set.map(_ + 1)
res3: scala.collection.immutable.Set[Int] = Set(4, 5, 6, 2, 3) // uh?!
Now, I understand why this works like this. It is explained there and there. In short: the implicit CanBuildFrom is inserted based on the static type, and, depending on the implementation of its def apply(from: Coll) method, may or may not be able to recreate the same collection type.
Now my only point is, when we know that we are using a mapping operation yielding a collection with the same element type (which the compiler can statically determine), we could mimic the way the filtering transformations work and use the collection's native builder. We can reuse BitSet when mapping to Ints, create a new TreeSet with the same ordering, etc.
Then we would avoid cases where
for (i <- set) {
val x = i + 1
println(x)
}
does not print the incremented elements of the TreeSet in the same order as
for (i <- set; x = i + 1)
println(x)
So:
Do you think this would be a good idea to change the behavior of the mapping transformations as described?
What are the inevitable caveats I have grossly overlooked?
How could it be implemented?
I was thinking about something like an implicit sameTypeEvidence: A =:= B parameter, maybe with a default value of null (or rather an implicit canReuseCalleeBuilderEvidence: B <:< A = null), which could be used at runtime to give more information to the CanBuildFrom, which in turn could be used to determine the type of builder to return.
I looked again at it, and I think your problem doesn't arise from a particular deficiency of Scala collections, but rather a missing builder for TreeSet. Because the following does work as intended:
val list = List(1,2,3,4,5)
val seq1: Seq[Int] = list
seq1.map( _ + 1 ) // yields List
val vector = Vector(1,2,3,4,5)
val seq2: Seq[Int] = vector
seq2.map( _ + 1 ) // yields Vector
So the reason is that TreeSet is missing a specialised companion object/builder:
seq1.companion.newBuilder[Int] // ListBuffer
seq2.companion.newBuilder[Int] // VectorBuilder
treeset.companion.newBuilder[Int] // Set (oops!)
So my guess is, if you take proper provision for such a companion for your RNA class, you may find that both map and filter work as you wish...?