Mutable Set += return value in Scala - scala

As I understand it, the point of having a += method on mutable Sets, is that
val s = collection.mutable.Set(1)
s += 2 // s is now a mutable Set(1, 2)
has an analogous effect to
var s = Set(1) // immutable
s += 2 // s is now an immutable Set(1, 2)
If that's so, why does the += method on the mutable Set return the Set itself? Wouldn't that make code more difficult to refactor e.g.
val s = collection.mutable.Set(1)
val s1 = s += 2 // s and s1 are now mutable Set(1, 2)
can't be refactored to
var s = Set(1) // immutable
var s1 = s += 2 // s is immutable Set(1, 2), s1 is now ()
while maintaining the original meaning. What's the reason behind this design decision?

(This is obviously just a guess)
Mutable Scala collections were intended to be used as Builders for immutable collections. There is one very prominent and very ancient immutable data structure on the JVM: the String. The corresponding builder (not tied to Scala in any way) existed already in Java: it was the StringBuilder. If you look into documentation, you will see dozens of versions of an overloaded method append. Every time, it returns the StringBuilder itself, which allows you to write the following:
// java code
myBuilder
.append('h')
.append('i');
I guess that the Scala collection.mutable API simply imitated the behavior of Java's StringBuilder, but replaced the append(...) by a somewhat shorter +=. In the end of the day, it's just an implementation of the classic builder pattern.

In the immutable case, s += 2 is the same as s = s + 2. So if you make s += 2 evaluate to the new value of s, then you kind of have to make every assignment statement evaluate to the result of the assignment. Other languages do it this way, but it has historically led to bugs, famously with C code like:
if (x = 0) {
...
}
So I think it makes sense not to have that return the set.
On the other hand, for the mutable case, += is just a method name, so it doesn't do assignment and can't really be responsible for this kind of bug in a meaningful way. And having it return itself enables the kind of builder pattern chaining that is sometimes useful.

Related

Can I mutate a variable in place in a purely functional way?

I know I can use state passing and state monads for purely functional mutation, but afaik that's not in-place and I want the performance benefits of doing it in-place.
An example would be great, e.g. adding 1 to a number, preferably in Idris but Scala will also be good
p.s. is there a tag for mutation? can't see one
No, this is not possible in Scala.
It is however possible to achieve the performance benefits of in-place mutation in a purely functional language. For instance, let's take a function that updates an array in a purely functional way:
def update(arr: Array[Int], idx: Int, value: Int): Array[Int] =
arr.take(idx) ++ Array(value) ++ arr.drop(idx + 1)
We need to copy the array here in order to maintain purity. The reason is that if we mutated it in place, we'd be able to observe that after calling the function:
def update(arr: Array[Int], idx: Int, value: Int): Array[Int] = {
arr(idx) = value
arr
}
The following code will work fine with the first implementation but break with the second:
val arr = Array(1, 2, 3)
assert(arr(1) == 2)
val arr2 = update(arr, 1, 42)
assert(arr2(1) == 42) // so far, so good…
assert(arr(1) == 2) // oh noes!
The solution in a purely functional language is to simply forbid the last assert. If you can't observe the fact that the original array was mutated, then there's nothing wrong with updating the array in place! The means to achieve this is called linear types. Linear values are values that you can use exactly once. Once you've passed a linear value to a function, the compiler will not allow you to use it again, which fixes the problem.
There are two languages I know of that have this feature: ATS and Haskell. If you want more details, I'd recommend this talk by Simon Peyton-Jones where he explains the implementation in Haskell:
https://youtu.be/t0mhvd3-60Y
Support for linear types has since been merged into GHC: https://www.tweag.io/blog/2020-06-19-linear-types-merged/

Scala: Update Array inside a Map

I am creating a Map which has an Array inside it. I need to keep adding values to that Array. How do I do that?
var values: Map[String, Array[Float]] = Map()
I tried several ways such as:
myobject.values.getOrElse("key1", Array()).++(Array(float1))
Few other ways to but nothing updates the array inside the Map.
There is a problem with this code:
values.getOrElse("key1", Array()).++(Array(float1))
This does not update the Map in values, it just creates a new Array and then throws it away.
You need to replace the original Map with a new, updated Map, like this:
values = values.updated("key1", values.getOrElse("key1", Array.empty[Float]) :+ float1)
To understand this you need to be clear on the distinction between mutable variables and mutable data.
var is used to create a mutable variable which means that the variable can be assigned a new value, e.g.
var name = "John"
name = "Peter" // Would not work if name was a val
By contrast mutable data is held in objects whose contents can be changed
val a = Array(1,2,3)
a(0) = 12 // Works even though a is a val not a var
In your example values is a mutable variable but the Map is immutable so it can't be changed. You have to create a new, immutable, Map and assign it to the mutable var.
From what I can see (according to ++), you would like to append Array, with one more element. But Array fixed length structure, so instead I'd recommend to use Vector. Because, I suppose, you are using immutable Map you need update it as well.
So the final solution might look like:
var values: Map[String, Vector[Float]] = Map()
val key = "key1"
val value = 1.0
values = values + (key -> (values.getOrElse(key, Vector.empty[Float]) :+ value))
Hope this helps!
You can use Scala 2.13's transform function to transform your map anyway you want.
val values = Map("key" -> Array(1f, 2f, 3f), "key2" -> Array(4f,5f,6f))
values.transform {
case ("key", v) => v ++ Array(6f)
case (_,v) => v
}
Result:
Map(key -> Array(1.0, 2.0, 3.0, 6.0), key2 -> Array(4.0, 5.0, 6.0))
Note that appending to arrays takes linear time so you might want to consider a more efficient data structure such as Vector or Queue or even a List (if you can afford to prepend rather than append).
Update:
However, if it is only one key you want to update, it is probably better to use updatedWith:
values.updatedWith("key")(_.map(_ ++ Array(6f)))
which will give the same result. The nice thing about the above code is that if the key does not exist, it will not change the map at all without throwing any error.
Immutable vs Mutable Collections
You need to choose what type of collection you will use immutable or mutable one. Both are great and works totally differently. I guess you are familiar with mutable one (from other languages), but immutable are default in scala and probably you are using it in your code (because it doesn't need any imports). Immutable Map cannot be changed... you can only create new one with updated values (Tim's and Ivan's answers covers that).
There are few ways to solve your problem and all are good depending on use case.
See implementation below (m1 to m6):
//just for convenience
type T = String
type E = Long
import scala.collection._
//immutable map with immutable seq (default).
var m1 = immutable.Map.empty[T,List[E]]
//mutable map with immutable seq. This is great for most use-cases.
val m2 = mutable.Map.empty[T,List[E]]
//mutable concurrent map with immutable seq.
//should be fast and threadsafe (if you know how to deal with it)
val m3 = collection.concurrent.TrieMap.empty[T,List[E]]
//mutable map with mutable seq.
//should be fast but could be unsafe. This is default in most imperative languages (PHP/JS/JAVA and more).
//Probably this is what You have tried to do
val m4 = mutable.Map.empty[T,mutable.ArrayBuffer[E]]
//immutable map with mutable seq.
//still could be unsafe
val m5 = immutable.Map.empty[T,mutable.ArrayBuffer[E]]
//immutable map with mutable seq v2 (used in next snipped)
var m6 = immutable.Map.empty[T,mutable.ArrayBuffer[E]]
//Oh... and NEVER DO THAT, this is wrong
//I mean... don't keep mutable Map in `var`
//var mX = mutable.Map.empty[T,...]
Other answers show immutable.Map with immutable.Seq and this is preferred way (or default at least). It costs something but for most apps it is perfectly ok. Here You have nice source of info about immutable data structures: https://stanch.github.io/reftree/talks/Immutability.html.
Each variant has it's own Pros and Cons. Each deals with updates differently, and it makes this question much harder than it looks at the first glance.
Solutions
val k = "The Ultimate Answer"
val v = 42f
//immutable map with immutable seq (default).
m1 = m1.updated(k, v :: m1.getOrElse(k, Nil))
//mutable map with immutable seq.
m2.update(k, v :: m2.getOrElse(k, Nil))
//mutable concurrent map with immutable seq.
//m3 is bit harder to do in scala 2.12... sorry :)
//mutable map with mutable seq.
m4.getOrElseUpdate(k, mutable.ArrayBuffer.empty[Float]) += v
//immutable map with mutable seq.
m5 = m5.updated(k, {
val col = m5.getOrElse(k, c.mutable.ArrayBuffer.empty[E])
col += v
col
})
//or another implementation of immutable map with mutable seq.
m6.get(k) match {
case None => m6 = m6.updated(k, c.mutable.ArrayBuffer(v))
case Some(col) => col += v
}
check scalafiddle with this implementations. https://scalafiddle.io/sf/WFBB24j/3.
This is great tool (ps: you can always save CTRL+S your changes and share link to write question about your snippet).
Oh... and if You care about concurrency (m3 case) then write another question. Such topic deserve to be in separate thread :)
(im)mutable api VS (im)mutable Collections
You can have mutable collection and still use immutable api that will copy orginal seq. For example Array is mutable:
val example = Array(1,2,3)
example(0) = 33 //edit in place
println(example.mkString(", ")) //33, 2, 3
But some functions on it (e.g. ++) will create new sequence... not change existing one:
val example2 = example ++ Array(42, 41) //++ is immutable operator
println(example.mkString(", ")) //33, 2, 3 //example stays unchanged
println(example2.mkString(", ")) //33, 2, 3, 42, 41 //but new sequence is created
There is method updateWith that is mutable and will exist only in mutable sequences. There is also updatedWith and it exists in both immutable AND mutable collections and if you are not careful enough you will use wrong one (yea ... 1 letter more).
This means you need to be careful which functions you are using, immutable or mutable one. Most of the time you can distinct them by result type. If something returns collection then it will be probably some kind of copy of original seq. It result is unit then it is mutable for sure.

Scala Sets Immutability

scala> var immSet = Set("A", "B")
immSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> immSet += "C"
scala> println(immSet)
Set(A, B, C)
I wonder, what is the advantage I am getting by allowing var to be used with with an immutable Set? Am I not losing immutability in this case?
What is the advantage I am getting by allowing var to be used with
with a Immutabable Sets?
I would say this can mainly cause confusion. The fact that you're using a var allows you to overwrite the variable, but, the Set by itself doesn't change, it allocates a new set with the additional value "C". But since you're using a var, the previous Set is now no longer referenced, unless you've referenced is somewhere else higher up the stack:
scala> var firstSet = Set("A", "B")
firstSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> var secondSet = firstSet
secondSet: scala.collection.immutable.Set[String] = Set(A, B)
scala> firstSet += "C"
scala> firstSet
res6: scala.collection.immutable.Set[String] = Set(A, B, C)
scala> secondSet
res7: scala.collection.immutable.Set[String] = Set(A, B)
Because secondSet still points to the Set created by firstSet, we don't see the value update reflected. I think making this immutable adds clarity that the underlying Set is immutable and well as the variable pointing to it. When you use a val, the compiler will yell if you attempt to reassign, forcing you to realize that a new collection is initialized.
Regarding immutability, we need to divide this into two. There is the immutability of the Set, and there is the immutability of the variable pointing to that Set, these are two different things. You lose the latter with this approach.
Read #YuvalItzchakov's answer if you want to understand what is the basic difference between immutable collection defined as var and mutable collection defined as val. I'll concentrate on practical aspects of both approaches.
First of all, both approaches imply mutability. If you want to stay "pure functional" you should avoid either of them.
Now, if you want mutable collection, what is the best way? Short answer, it depends.
Performance. Mutable collections are usually faster than their immutable counterparts. It means, that if your mutable variable is somehow contained (for example, doesn't escape private method), it may be better to use val c = MutableCollection(). Most of the Scala's methods in Collections API internally use mutable collections.
Thread safety. Value of immutable collection is always thread safe. You can send it to another thread and don't think about visibility and concurrent changes.
var a = ImmutableCol()
otherThreadProcessor.process(a)
a += 1 // otherThread will still have previous value
On the other hand, if you want to modify collection from multiple threads, better use Java's concurrent collection API.
Code clarity. Imagine you have some function that takes collection as an argument and then modifies it in some ways. If collection is mutable, then, after function returns, collection, passed as an argument, will stay modified.
def recImmutable(a:Set[Int]): Unit = {
var b = a
b += 4
}
val a = Set(2,3)
recImmutable(a)
println(a)
// prints Set(2, 3)
def recMutable(a:mutable.Set[Int]): Unit = {
var b = a
b += 4
}
val b = mutable.Set(2,3)
recMutable(b)
println(b)
// prints Set(2, 3, 4)

Scala type mismatch when adding an element to an array

I have the following array:
var as = Array.empty[Tuple2[Int, Int]]
I am adding an element to it like this:
var nElem = Tuple2(current, current)
as += nElem
current is a var of type Int
However, I am getting this error:
Solution.scala:51: error: type mismatch;
found : (Int, Int)
required: String
as += nElem
I don't understand why this is appearing. I haven't declared a String anywhere.
+= is the string concatenation operator.
You are looking for :+ to append to an array. Note, that Array length is immutable, so :+=, will return a new array, with the nElem appended, and assign it to the as variable, the original array will stay unchanged (take this as a hint, that you are likely doing something in a suboptimal way).
Note, that if you find yourself using var, that is almost always a sign of a bad design in your code. Mutable objects and variables are considered really bad taste in functional programming. Sometimes, you can't get away without using them, but those are rare corner cases. Most of the time, you should not need mutability.
Also, do not use Tuple2. Just do Array.empty[(Int, Int)], nElem = (current, current) etc.
Use a :+= to modify the variable in place. However, remember this: Using both var and a mutable data structure at the same time (like Array) is a sign of really bad programming. Either is sometimes fine, though.
However, note that this operation is O(n), therefore pushing n elements like that is going to be slow, O(n²). Arrays are not meant to have elements pushed to back like that. You can alternatively use a var Vector instead and call .toArray() on it at the end or use a mutable val ArrayBuffer. However, prefer functional style of programming, unless it produces less readable code.
Also, avoid typing Tuple2 explicitly. Use Array.empty[(Int, Int)] and var nElem = (current, current).
The semantics of + are weird because of the automatic conversion to String in certain cases. To append to an array, use the :+ method:
as :+= nElem

How to remove duplicates from collection (without creating new ones in-between)?

So first up, I'm fully aware mutation is a bad idea, but I need to keep object creation down to a minimum as I have an incredibly huge amount of data to process (keeps GC hang time down and speeds up my code).
What I want is a scala collection that has a method like distinct or similar, or possibly a library or code snippet (but native scala preferred) such that the method is side effecting / mutating the collection rather than creating a new collection.
I've explored the usual suspects like ArrayBuffer, mutable.List, Array, MutableList, Vector and they all "create a new sequence" from the original rather than mutate the original in place. Am I trying to find something that does not exist? Will I just have to write my own??
I think this exists in C++ http://www.cplusplus.com/reference/algorithm/unique/
Also, mega mega bonus points if there is some kind of awesome tail recursive way of doing this so that any bookkeeping structures created are kept in a single stack frame that is thus deallocated from memory once the method exits. The reason this would be uber cool is then even if the method creates some instances of things in order to perform the removal of duplicates, those instance will not need to be garbage collected and therefore not contribute to massive GC hangs. It doesn't have to be recursion, as long as it's likely to cause the objects to go on the stack (see escape analysis here http://www.ibm.com/developerworks/java/library/j-jtp09275/index.html)
(Also if I can specify and fix the capacity (size in memory) of the collection that would also be great)
The algorithm (for C++), you mentioned is for consecutive duplicates. So if you need it for consecutive, you could use some LinkedList, but mutable lists was deprecated. On the other hand if you want something memory-efficient and agree with linear access - you could wrap your collection (mutable or immutable) with distinct iterator (O(N)):
def toConsDist[T](c: Traversable[T]) = new Iterator[T] {
val i = c.toIterator
var prev: Option[T] = None
var _nxt: Option[T] = None
def nxt = {
if (_nxt.isEmpty) _nxt = i.find(x => !prev.toList.contains(x))
prev = _nxt
_nxt
}
def hasNext = nxt.nonEmpty
def next = {
val next = nxt.get
_nxt = None
next
}
}
scala> toConsDist(List(1,1,1,2,2,3,3,3,2,2)).toList
res44: List[Int] = List(1, 2, 3, 2)
If you need to remove all duplicates, it will be О(N*N), but you can't use scala collections for that, because of https://github.com/scala/scala/commit/3cc99d7b4aa43b1b06cc837a55665896993235fc (see LinkedList part), https://stackoverflow.com/a/27645224/1809978.
But you may use Java's LinkedList:
import scala.collection.JavaConverters._
scala> val mlist = new java.util.LinkedList[Integer]
mlist: java.util.LinkedList[Integer] = []
scala> mlist.asScala ++= List(1,1,1,2,2,3,3,3,2,2)
res74: scala.collection.mutable.Buffer[Integer] = Buffer(1, 1, 1, 2, 2, 3, 3, 3, 2, 2)
scala> var i = 0
i: Int = 0
scala> for(x <- mlist.asScala){ if (mlist.indexOf(x) != i) mlist.set(i, null); i+=1} //O(N*N)
scala> while(mlist.remove(null)){} //O(N*N)
scala> mlist
res77: java.util.LinkedList[Integer] = [1, 2, 3]
mlist.asScala just creates wrapper without any copying. You can't modify Java's LinkedList during iteration, that's why i used null's. You may try Java ConcurrentLinkedQueue, but it doesn't support indexOf, so you will have to implement it by yourself (scala maps it to the Iterator, so asScala.indexOf won't work).
By definition, immutability forces you to create new objects whenever you want to change your collection.
What Scala provides for some collection are buffers which allow you to build a collection using a mutable interface and finally returning a immutable version but once you got your immutable collection you can't change its references in any way, that includes filtering as distinct. The furthest point you can reach concerning mutability in an immutable collection is changing its elements state when these are mutable objects.
On the other hand, some collections as Vector are implemented as trees (in this case as a trie) and insert or delete operations are implemented not by copying the entire tree but just the required branches.
From Martin Ordesky's Programming in Scala:
Updating an element in the middle of a vector can be done by copying
the node that contains the element, and every node that points to it,
starting from the root of the tree. This means that a functional
update creates between one and five nodes that each contain up to 32
elements or subtrees. This is certainly more expensive than an
in-place update in a mutable array, but still a lot cheaper than
copying the whole vector.