mutable vs. immutable in Scala collections
Before I post this question, I have read the above article. Apparently if you store something in val, you can't modify it, but then if you store a mutable collection such as ArrayBuffer, you can modify it!
scala> val b = ArrayBuffer[Int](1,2,3)
b: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> b += 1
res50: b.type = ArrayBuffer(1, 2, 3, 1)
scala> b
res51: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 1)
What is the use of using val to store a mutable ArrayBuffer? I assume the only reason b changes is because val b holds the memory address to that ArrayBuffer(1,2,3).
If you try var x = 1; val y = x; x = 5; y, the output will still be 1. In this case, y stores an actual value instead of the address to x.
Java doesn't have this confusion because it's clear an Object can't be assigned to an int variable .
How do I know when is the variable in scala carrying a value, when is a memory address? What's the point of storing a mutable collection in a immutable variable?
A simple answer is that vals and vars are all references. There're no primitive types in Scala. They're all objects.
val x = 1
is a reference named x that points to an immutable integer object 1. You cannot do 1.changeTo(2) or something, so if you have
val value = 5
val x = value
var y = value
You can do y += 10 This changes y to reference a new object, (5 + 10) = 15. The original 5 remains 5.
On the other hand, you cannot do x += 10 because x is a val which means it must always point to 5. So, this doesn't compile.
You may wonder why you can do val b = ArrayBuffer(...) and then b += something even though b is a val. That's because += is actually a method, not an assignment. Calling b += something gets translated to b.+=(something). The method += just adds a new element (something) to its mutable self and returns itself for further assignment.
Let's see an example
scala> val xs = ArrayBuffer(1,2,3)
xs: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3)
scala> val ys = ( xs += 999 )
ys: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs
res0: scala.collection.mutable.ArrayBuffer[Int] = ArrayBuffer(1, 2, 3, 999)
scala> ys
res1: xs.type = ArrayBuffer(1, 2, 3, 999)
scala> xs eq ys
res2: Boolean = true
This confirms xs and ys point to the same (mutable) ArrayBuffer. The eq method is like Java's ==, which compares object identity. Mutable/Immutable references (val/var) and mutable/immutable data structures (ArrayBuffer, List) are different. So, if you do another xs += 888, the ys which is an immutable reference pointing to a mutable data structure also contains 888.
What's the point of storing a mutable collection in a immutable variable
val a = new ArrayBuffer(1)
a = new ArrayBuffer[Int]()
<console>:9: error: reassignment to val
It prevents the variable from being assigned to a new memory address. In practice though scala encourages you not to use mutable state (to avoid locking, blocking, etc), so I'm having trouble coming up with an example for a real situation where the choice of var or val for mutable state matters.
Immutable object and constant value are two different things.
If you define your collection as val means that the referenced instance of the collection will always be the same. But this instance can be mutable or immutable: if it is immutable you cannot add or remove items in that instance, vice versa if it is mutable you can do it. When a collection is immutable to add or remove items you always create a copy.
How do I know when is the variable in scala carrying a value, when is a memory address?
Scala always runs on the JVM (.NET support was discontinued), so types that are primitive types on JVM will be treated as primitive types by Scala.
What is the use of using val to store a mutable ArrayBuffer?
The closest alternative would be to use a var to store an immutable Seq. If that Seq was very large, you wouldn't want to have to copy the whole Seq every time you made a change to it - but that's what you might have to do! That would be very slow!
Related
In scala, you can have 2 types of set where the elements are immutable or mutable, But as you cannot index these sets, so how can you change the elements of the latter sets??
In scala, you can have 2 types of set where the elements are immutable or mutable
That is not what the distinction between scala.collection.immutable.Set and scala.collection.mutable.Set is. It is not about mutability or immutability of the elements, it is about mutability or immutability of the sets themselves.
But as you cannot index these sets, so how can you change the elements of the latter sets
It is not about changing the elements of mutable sets. It is about changing the mutable sets themselves:
import scala.collection.mutable.{ Set => MSet}
import scala.collection.immutable.{Set => ISet}
val mset = MSet(1, 2, 3)
val iset = ISet(1, 2, 3)
mset += 4
mset //=> HashSet(1, 2, 3, 4): scala.collection.mutable.Set[Int]
val iset2 = iset + 4
iset //=> Set(1, 2, 3): scala.collection.immutable.Set[Int]
iset2 //=> Set(1, 2, 3, 4): scala.collection.immutable.Set[Int]
iset += 4
// ERROR: value += is not a member of scala.collection.immutable.Set[Int]
[Scastie link]
The difference between the two is that you can't add an element to an immutable set. Instead, when you call the scala.collection.immutable.Set.+(elem: A): Set[A] method, it will return a new set (iset2 in this case) that has the same elements as the original set (iset) plus the element we wanted to add.
Whereas the scala.collection.mutable.Set.+=(elem: A): Set.this.type method returns the same set (and in fact, as you can see in my example above, I actually ignore the return value) and mutates it to contain the additional element.
In Scala, the idea of a set is much closer to the mathematical idea of sets instead of as a collection. It is not a collection that contains the elements, rather it is a predicate function that you can ask "is this element a member of the set"?
scala> var immSet = Set("A", "B")
immSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> immSet += "C"
scala> println(immSet)
Set(A, B, C)
I wonder, what is the advantage I am getting by allowing var to be used with with an immutable Set? Am I not losing immutability in this case?
What is the advantage I am getting by allowing var to be used with
with a Immutabable Sets?
I would say this can mainly cause confusion. The fact that you're using a var allows you to overwrite the variable, but, the Set by itself doesn't change, it allocates a new set with the additional value "C". But since you're using a var, the previous Set is now no longer referenced, unless you've referenced is somewhere else higher up the stack:
scala> var firstSet = Set("A", "B")
firstSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> var secondSet = firstSet
secondSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> firstSet += "C"
scala> firstSet
res6: scala.collection.immutable.Set[String] = Set(A, B, C)
scala> secondSet
res7: scala.collection.immutable.Set[String] = Set(A, B)
Because secondSet still points to the Set created by firstSet, we don't see the value update reflected. I think making this immutable adds clarity that the underlying Set is immutable and well as the variable pointing to it. When you use a val, the compiler will yell if you attempt to reassign, forcing you to realize that a new collection is initialized.
Regarding immutability, we need to divide this into two. There is the immutability of the Set, and there is the immutability of the variable pointing to that Set, these are two different things. You lose the latter with this approach.
Read #YuvalItzchakov's answer if you want to understand what is the basic difference between immutable collection defined as var and mutable collection defined as val. I'll concentrate on practical aspects of both approaches.
First of all, both approaches imply mutability. If you want to stay "pure functional" you should avoid either of them.
Now, if you want mutable collection, what is the best way? Short answer, it depends.
Performance. Mutable collections are usually faster than their immutable counterparts. It means, that if your mutable variable is somehow contained (for example, doesn't escape private method), it may be better to use val c = MutableCollection(). Most of the Scala's methods in Collections API internally use mutable collections.
Thread safety. Value of immutable collection is always thread safe. You can send it to another thread and don't think about visibility and concurrent changes.
var a = ImmutableCol()
otherThreadProcessor.process(a)
a += 1 // otherThread will still have previous value
On the other hand, if you want to modify collection from multiple threads, better use Java's concurrent collection API.
Code clarity. Imagine you have some function that takes collection as an argument and then modifies it in some ways. If collection is mutable, then, after function returns, collection, passed as an argument, will stay modified.
def recImmutable(a:Set[Int]): Unit = {
var b = a
b += 4
}
val a = Set(2,3)
recImmutable(a)
println(a)
// prints Set(2, 3)
def recMutable(a:mutable.Set[Int]): Unit = {
var b = a
b += 4
}
val b = mutable.Set(2,3)
recMutable(b)
println(b)
// prints Set(2, 3, 4)
Suppose I want to construct a BitSet containing all integers from 0 until n satisfying some predicate f: Int => Boolean.
I could write something like
BitSet((0 until n):_*).filter(f)
which of course works. But it feels rather inefficient! I'm planning on doing this inside a pretty tight loop, and would like suggestions for more efficient ways.
This is the best I could come up with at the moment
BitSet((0 until n).view.filter(f):_*)
The view part makes the filter method lazy. This makes sure that when the BitSet is created from the given sequence, it will filter on the fly. Your original suggestion creates a new BitSet after the first one is created.
If performance is truly your major concern, the best option is probably to use a mutable.BitSet and a while loop, and then call toImmutable on the result.
val bitSet = {
val tmp = new scala.collection.mutable.BitSet(n)
var i = 0;
while (i < n) {
if (f(i)) {
tmp += i
}
i = i + 1
}
tmp.toImmutable
}
I think the most efficient "functional" way is to use foldLeft:
(1 to 5).foldLeft(BitSet())((s,i) => if (f(i)) s + i else s)
It doesn't create an intermediate collection but construct the collection from scratch while filtering.
The first thing I thought is to use breakOut, but it doesn't work for filter:
scala> val set: BitSet = (0 until 10).filter(f)(collection.breakOut)
<console>:11: error: polymorphic expression cannot be instantiated to expected type;
found : [From, T, To]scala.collection.generic.CanBuildFrom[From,T,To]
required: Int
val set: BitSet = (0 until 10).filter(f)(collection.breakOut)
^
scala> val set: BitSet = (0 until 10).map(_+1)(collection.breakOut)
set: scala.collection.immutable.BitSet = BitSet(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
breakOut doesn't create an intermediate collection too, but because filter doesn't have a second parameter list it can't work.
Does anyone know how to create a lazy iterator in scala?
For example, I want to iterate through instantiating each element. After passing, I want the instance to die / be removed from memory.
If I declare an iterator like so:
val xs = Iterator(
(0 to 10000).toArray,
(0 to 10).toArray,
(0 to 10000000000).toArray)
It creates the arrays when xs is declared. This can be proven like so:
def f(name: String) = {
val x = (0 to 10000).toArray
println("f: " + name)
x
}
val xs = Iterator(f("1"),f("2"),f("3"))
which prints:
scala> val xs = Iterator(f("1"),f("2"),f("3"))
f: 1
f: 2
f: 3
xs: Iterator[Array[Int]] = non-empty iterator
Anyone have any ideas?
Streams are not suitable because elements remain in memory.
Note: I am using an Array as an example, but I want it to work with any type.
Scala collections have a view method which produces a lazy equivalent of the collection. So instead of (0 to 10000).toArray, use (0 to 10000).view. This way, there will be no array created in the memory. See also https://stackoverflow.com/a/6996166/90874, https://stackoverflow.com/a/4799832/90874, https://stackoverflow.com/a/4511365/90874 etc.
Use one of Iterator factory methods which accepts call-by-name parameter.
For your first example you can do one of this:
val xs1 = Iterator.fill(3)((0 to 10000).toArray)
val xs2 = Iterator.tabulate(3)(_ => (0 to 10000).toArray)
val xs3 = Iterator.continually((0 to 10000).toArray).take(3)
Arrays won't be allocated until you need them.
In case you need different expressions for each element, you can create separate iterators and concatenate them:
val iter = Iterator.fill(1)(f("1")) ++
Iterator.fill(1)(f("2")) ++
Iterator.fill(1)(f("3"))
I have the following map in Scala:
var m = Map[Int,Set[Int]]()
m += 1 -> Set(1)
m(1) += 2
I've discovered that the last line doesn't work. I get "error: reassignment to val".
So I tried
var s = m(1)
s += 2
Then when I compared m(1) with s after I added 2 to it, their contents were different. So how can I add an element to a set which is the value of a map?
I come from a Java/C++ background so what I tried seems natural to me, but apparently it's not in Scala.
You're probably using immutable.Map. You need to use mutable.Map, or replace the set instead of modifying it with another immutable map.
Here's a reference of a description of the mutable vs immutable data structures.
So...
import scala.collection.mutable.Map
var m = Map[Int,Set[Int]]()
m += 1 -> Set(1)
m(1) += 2
In addition to #Stefan answer:
instead of using mutable Map, you can use mutable Set
import scala.collection.mutable.{Set => mSet}
var m = Map[Int,mSet[Int]]()
m += 1 -> mSet(1)
m(1)+=2
mSet is a shortcut to mutable Set introduced to reduce verbosity.
scala> m
res9: scala.collection.immutable.Map[Int,scala.collection.mutable.Set[Int]] = Map(1 -> Set(2, 1))
I think what you really want here is a MultiMap
import collection.mutable.{Set, Map, HashMap, MultiMap}
val m = new HashMap[Int,Set[Int]] with MultiMap[Int, Int]
m.addBinding(1,1)
m.addBinding(1,2)
m.addBinding(2,3)
Note that m itself is a val, as it's the map itself which is now mutable, not the reference to the map
At this point, m will now be a:
Map(
1 -> Set(1,2),
2 -> Set(3)
)
Unfortunately, there's no immutable equivalent to MultiMap, and you have to specify the concrete subclass of mutable.Map that you'll use at construction time.
For all subsequent operations, it's enough to just pass the thing around typed as a MultiMap[Int,Int]