Idiomatic way of modifying an immutable collection - scala

I'm interested in understanding the pattern of modifying an immutable collection in the most effective way.
Suppose for example I want to define the following class (an actual implementation may overload the operators, but this is just for illustration). Is this the best way (with respect to performance / safety)?
import scala.collection.immutable.HashMap
class Example[A,B] {
var col = new HashMap[A, B]()
def add(k: A, v: B): Unit = col = col + (k -> v)
def remove(k: A): Unit = col = col - k
}
would this approach be possible as well, in some way I'm missing?
class Example[A,B] extends HashMap[A,B] {
def add(k: A, v: B): Unit = ???
def remove(k: A): Unit = ???
}

The comments above are correct. The idea of immutable data structures exists and there are ways to do it, however, it seems that you may just want to stick with a mutable structure since you have a var anyway.
Check out chapter 17 of the free online pdf of Programming In Scala here where Odersky discusses the designing and building of an Immutable queue.
The gist of it is that you never actually modify the structure that you're trying to change, you just take it in, look at it, and then build a new one based on the old one and whatever it is you're trying to do.
It's the same idea behind the fact that with lists:
val list = List(1,2,3)
1 :: list
println(list)
Will print out List(1,2,3) as opposed to List(1,1,2,3)
While:
val list = List(1,2,3)
val list1 = 1 :: list
println(list1)
is what prints List(1,1,2,3)
Here's a nice slide deck that discusses some popular data structures and their functional counterparts. Functional Data Structures

It sounds like you're trying to make immutable data structures mutable. There are situations where you need to think about performance to be sure, but given that Scala has persistent data structures I'd focus on the use case and not the performance model.
val a = Map("key1" -> "some value")
val b = a + ("key2" -> "some other value")
Now b contains both entries.
If you actually need the mutable structures for your case, just use the mutable.Map.

Related

Yield mutable.seq from mutable.traversable type in Scala

I have a variable underlying of type Option[mutable.Traversable[Field]]
All I wanted todo in my class was provide a method to return this as Sequence in the following way:
def toSeq: scala.collection.mutable.Seq[Field] = {
for {
f <- underlying.get
} yield f
}
This fails as it complains that mutable.traversable does not conform to mutable.seq. All it is doing is yielding something of type Field - in my mind this should work?
A possible solution to this is:
def toSeq: Seq[Field] = {
underlying match {
case Some(x) => x.toSeq
case None =>
}
}
Although I have no idea what is actually happening when x.toSeq is called and I imagine there is more memory being used here that actually required to accomplish this.
An explanation or suggestion would be much appreciated.
I am confused why you say that "I imagine there is more memory being used here than actually required to accomplish". Scala will not copy your Field values when doing x.toSeq, it is simply going to create an new Seq which will have pointers to the same Field values that underlying is pointing to. Since this new structure is exactly what you want there is no avoiding the additional memory associated with the extra pointers (but the amount of additional memory should be small). For a more in-depth discussion see the wiki on persistent data structures.
Regarding your possible solution, it could be slightly modified to get the result you're expecting:
def toSeq : Seq[Field] =
underlying
.map(_.toSeq)
.getOrElse(Seq.empty[Field])
This solution will return an empty Seq if underlying is a None which is safer than your original attempt which uses get. I say it's "safer" because get throws a NoSuchElementException if the Option is a None whereas my toSeq can never fail to return a valid value.
Functional Approach
As a side note: when I first started programming in scala I would write many functions of the form:
def formatSeq(seq : Seq[String]) : Seq[String] =
seq map (_.toUpperCase)
This is less functional because you are expecting a particular collection type, e.g. formatSeq won't work on a Future.
I have found that a better approach is to write:
def formatStr(str : String) = str.toUpperCase
Or my preferred coding style:
val formatStr = (_ : String).toUpperCase
Then the user of your function can apply formatStr in any fashion they want and you don't have to worry about all of the collection casting:
val fut : Future[String] = ???
val formatFut = fut map formatStr
val opt : Option[String] = ???
val formatOpt = opt map formatStr

Migrate from MurmurHash to MurmurHash3

In Scala 2.10, MurmurHash for some reason is deprecated, saying I should use MurmurHash3 now. But the API is different, and there is no useful scaladocs for MurmurHash3 -> fail.
For instance, current code:
trait Foo {
type Bar
def id: Int
def path: Bar
override def hashCode = {
import util.MurmurHash._
var h = startHash(2)
val c = startMagicA
val k = startMagicB
h = extendHash(h, id, c, k)
h = extendHash(h, path.##, nextMagicA(c), nextMagicB(k))
finalizeHash(h)
}
}
How would I do this using MurmurHash3 instead? This needs to be a fast operation, preferably without allocations, so I do not want to construct a Product, Seq, Array[Byte] or whathever MurmurHash3 seems to be offering me.
The MurmurHash3 algorithm was changed, confusingly, from an algorithm that mixed in its own salt, essentially (c and k), to one that just does more bit-mixing. The basic operation is now mix, which you should fold over all your values, after which you should finalizeHash (the Int argument for length is for convenience also, to help with distinguishing collections of different length). If you want to replace your last mix by mixLast, it's a little faster and removes redundancy with finalizeHash. If it takes you too long to detect what the last mix is, just mix.
Typically for a collection you'll want to mix in an extra value to indicate what type of collection it is.
So minimally you'd have
override def hashCode = finalizeHash(mixLast(id, path.##), 0)
and "typically" you'd
// Pick any string or number that suits you, put in companion object
val fooSeed = MurmurHash3.stringHash("classOf[Foo]")
// I guess "id" plus "path" is two things?
override def hashCode = finalizeHash(mixLast( mix(fooSeed,id), path.## ), 2)
Note that the length field is NOT there to give a high-quality hash that mixes in that number. All mixing of important hash values should be done with mix.
Looking at the source code of MurmurHash3 suggests something like this:
override def hashCode = {
import util.hashing.MurmurHash3._
val h = symmetricSeed // I'm not sure which seed to use here
val h1 = mix(h, id)
val h2 = mixLast(h1, path ##)
finalizeHash(h2, 2)
}
or, in (almost) one line:
import util.hashing.MurmurHash3._
override def hashCode = finalizeHash(mix(mix(symmetricSeed, id), path ##), 2)

What are good examples of: "operation of a program should map input values to output values rather than change data in place"

I came across this sentence in Scala in explaining its functional behavior.
operation of a program should map input of values to output values rather than change data in place
Could somebody explain it with a good example?
Edit: Please explain or give example for the above sentence in its context, please do not make it complicate to get more confusion
The most obvious pattern that this is referring to is the difference between how you would write code which uses collections in Java when compared with Scala. If you were writing scala but in the idiom of Java, then you would be working with collections by mutating data in place. The idiomatic scala code to do the same would favour the mapping of input values to output values.
Let's have a look at a few things you might want to do to a collection:
Filtering
In Java, if I have a List<Trade> and I am only interested in those trades executed with Deutsche Bank, I might do something like:
for (Iterator<Trade> it = trades.iterator(); it.hasNext();) {
Trade t = it.next();
if (t.getCounterparty() != DEUTSCHE_BANK) it.remove(); // MUTATION
}
Following this loop, my trades collection only contains the relevant trades. But, I have achieved this using mutation - a careless programmer could easily have missed that trades was an input parameter, an instance variable, or is used elsewhere in the method. As such, it is quite possible their code is now broken. Furthermore, such code is extremely brittle for refactoring for this same reason; a programmer wishing to refactor a piece of code must be very careful to not let mutated collections escape the scope in which they are intended to be used and, vice-versa, that they don't accidentally use an un-mutated collection where they should have used a mutated one.
Compare with Scala:
val db = trades filter (_.counterparty == DeutscheBank) //MAPPING INPUT TO OUTPUT
This creates a new collection! It doesn't affect anyone who is looking at trades and is inherently safer.
Mapping
Suppose I have a List<Trade> and I want to get a Set<Stock> for the unique stocks which I have been trading. Again, the idiom in Java is to create a collection and mutate it.
Set<Stock> stocks = new HashSet<Stock>();
for (Trade t : trades) stocks.add(t.getStock()); //MUTATION
Using scala the correct thing to do is to map the input collection and then convert to a set:
val stocks = (trades map (_.stock)).toSet //MAPPING INPUT TO OUTPUT
Or, if we are concerned about performance:
(trades.view map (_.stock)).toSet
(trades.iterator map (_.stock)).toSet
What are the advantages here? Well:
My code can never observe a partially-constructed result
The application of a function A => B to a Coll[A] to get a Coll[B] is clearer.
Accumulating
Again, in Java the idiom has to be mutation. Suppose we are trying to sum the decimal quantities of the trades we have done:
BigDecimal sum = BigDecimal.ZERO
for (Trade t : trades) {
sum.add(t.getQuantity()); //MUTATION
}
Again, we must be very careful not to accidentally observe a partially-constructed result! In scala, we can do this in a single expression:
val sum = (0 /: trades)(_ + _.quantity) //MAPPING INTO TO OUTPUT
Or the various other forms:
(trades.foldLeft(0)(_ + _.quantity)
(trades.iterator map (_.quantity)).sum
(trades.view map (_.quantity)).sum
Oh, by the way, there is a bug in the Java implementation! Did you spot it?
I'd say it's the difference between:
var counter = 0
def updateCounter(toAdd: Int): Unit = {
counter += toAdd
}
updateCounter(8)
println(counter)
and:
val originalValue = 0
def addToValue(value: Int, toAdd: Int): Int = value + toAdd
val firstNewResult = addToValue(originalValue, 8)
println(firstNewResult)
This is a gross over simplification but fuller examples are things like using a foldLeft to build up a result rather than doing the hard work yourself: foldLeft example
What it means is that if you write pure functions like this you always get the same output from the same input, and there are no side effects, which makes it easier to reason about your programs and ensure that they are correct.
so for example the function:
def times2(x:Int) = x*2
is pure, while
def add5ToList(xs: MutableList[Int]) {
xs += 5
}
is impure because it edits data in place as a side effect. This is a problem because that same list could be in use elsewhere in the the program and now we can't guarantee the behaviour because it has changed.
A pure version would use immutable lists and return a new list
def add5ToList(xs: List[Int]) = {
5::xs
}
There are plenty examples with collections, which are easy to come by but might give the wrong impression. This concept works at all levels of the language (it doesn't at the VM level, however). One example is the case classes. Consider these two alternatives:
// Java-style
class Person(initialName: String, initialAge: Int) {
def this(initialName: String) = this(initialName, 0)
private var name = initialName
private var age = initialAge
def getName = name
def getAge = age
def setName(newName: String) { name = newName }
def setAge(newAge: Int) { age = newAge }
}
val employee = new Person("John")
employee.setAge(40) // we changed the object
// Scala-style
case class Person(name: String, age: Int) {
def this(name: String) = this(name, 0)
}
val employee = new Person("John")
val employeeWithAge = employee.copy(age = 40) // employee still exists!
This concept is applied on the construction of the immutable collection themselves: a List never changes. Instead, new List objects are created when necessary. Use of persistent data structures reduce the copying that would happen on a mutable data structure.

Scala collection of elements accessible by name

I have some Scala code roughly analogous to this:
object Foo {
val thingA = ...
val thingB = ...
val thingC = ...
val thingD = ...
val thingE = ...
val thingsOfAKind = List(thingA, thingC, thingE)
val thingsOfADifferentKind = List(thingB, thingD)
val allThings = thingsOfAKind ::: thingsOfADifferentKind
}
Is there some nicer way of declaring a bunch of things and being able to access them both individually by name and collectively?
The issue I have with the code above is that the real version has almost 30 different things, and there's no way to actually ensure that each new thing I add is also added to an appropriate list (or that allThings doesn't end up with duplicates, although that's relatively easy to fix).
The various things are able to be treated in aggregate by almost all of the code base, but there are a few places and a couple of things where the individual identity matters.
I thought about just using a Map, but then the compiler looses the ability to check that the individual things being looked up actually exist (and I have to either wrap code to handle a failed lookup around every attempt, or ignore the problem and effectively risk null pointer exceptions).
I could make the kind each thing belongs to an observable property of the things, then I would at least have a single list of all things and could get lists of each of the kinds with filter, but the core issue remains that I would ideally like to be able to declare that a thing exists, has a name (identifier), and is part of a collection.
What I effectively want is something like a compile-time Map. Is there a good way of achieving something like this in Scala?
How about this type of pattern?
class Things[A] {
var all: List[A] = Nil
def ->: (x: A): A = { all = x :: all; x }
}
object Test {
val things1 = new Things[String]
val things2 = new Things[String]
val thingA = "A" ->: things1
val thingB = "B" ->: things2
val thingC = "C" ->: things1
val thingD = ("D" ->: things1) ->: things2
}
You could also add a little sugar, making Things automatically convertible to List,
object Things {
implicit def thingsToList[A](things: Things[A]): List[A] = things.all
}
I can't think of a way to do this without the var that has equally nice syntax.

Why does Scala maintain the type of collection not return Iterable (as in .Net)?

In Scala, you can do
val l = List(1, 2, 3)
l.filter(_ > 2) // returns a List[Int]
val s = Set("hello", "world")
s.map(_.length) // returns a Set[Int]
The question is: why is this useful?
Scala collections are probably the only existing collection framework that does this. Scala community seems to agree that this functionality is needed. Yet, noone seems to miss this functionality in the other languages. Example C# (modified naming to match Scala's):
var l = new List<int> { 1, 2, 3 }
l.filter(i => i > 2) // always returns Iterable[Int]
l.filter(i => i > 2).toList // if I want a List, no problem
l.filter(i => i > 2).toSet // or I want a Set
In .NET, I always get back an Iterable and it is up to me what I want to do with it. (This also makes .NET collections very simple) .
The Scala example with Set forces me to make a Set of lengths out of a Set of string. But what if I just want to iterate over the lengths, or construct a List of lengths, or keep the Iterable to filter it later. Constructing a Set right away seems pointless. (EDIT: collection.view provides the simpler .NET functionality, nice)
I am sure you will show me examples where the .NET approach is absolutely wrong or kills performance, but I just can't see any (using .NET for years).
Not a full answer to your question, but Scala never forces you to use one collection type over another. You're free to write code like this:
import collection._
import immutable._
val s = Set("hello", "world")
val l: Vector[Int] = s.map(_.length)(breakOut)
Read more about breakOut in Daniel Sobral's detailed answer to another question.
If you want your map or filter to be evaluated lazily, use this:
s.view.map(_.length)
This whole behavior makes it easy to integrate your new collection classes and inherit all the powerful capabilities of the standard collection with no code duplication, all of this ensuring that YourSpecialCollection#filter returns an instance of YourSpecialCollection; that YourSpecialCollection#map returns an instance of YourSpecialCollection if it supports the type being mapped to, or a built-in fallback collection if it doesn't (like what happens of you call map on a BitSet). Surely, a C# iterator has no .toMySpecialCollection method.
See also: “Integrating new sets and maps” in The Architecture of Scala Collections.
Scala follows the "uniform return type principle" assuring that you always end up with the appropriate return type, instead of loosing that information like in C#.
The reason C# does it this was is that their type system is not good enough to provide these assurances without overriding the whole implementation of every method in every single subclass. Scala solves this with the usage of Higher Kinded Types.
Why Scala has the only collection framework doing this? Because it is harder than most people think it is, especially when things like Strings and Arrays which are no "real" collections should be integrated as well:
// This stays a String:
scala> "Foobar".map(identity)
res27: String = Foobar
// But this falls back to the "nearest" appropriate type:
scala> "Foobar".map(_.toInt)
res29: scala.collection.immutable.IndexedSeq[Int] = Vector(70, 111, 111, 98, 97, 114)
If you have a Set, and an operation on it returns an Iterable while its runtime type is still a Set, then you're losing important informations about its behavior, and the access to set-specific methods.
BTW: There are other languages behaving similar, like Haskell, which influenced Scala a lot. The Haskell version of map would look like this translated to Scala (without implicitmagic):
//the functor type class
trait Functor[C[_]] {
def fmap[A,B](f: A => B, coll: C[A]) : C[B]
}
//an instance
object ListFunctor extends Functor[List] {
def fmap[A,B](f: A => B, list: List[A]) : List[B] = list.map(f)
}
//usage
val list = ListFunctor.fmap((x:Int) => x*x, List(1,2,3))
And I think the Haskell community values this feature as well :-)
It is a matter of consistency. Things are what they are, and return things like them. You can depend on it.
The difference you make here is one of strictness. A strict method is immediately evaluated, while a non-strict method is only evaluated as needed. This has consequences. Take this simple example:
def print5(it: Iterable[Int]) = {
var flag = true
it.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
Test it with these two collections:
print5(List.range(1, 10))
print5(Stream.range(1, 10))
Here, List is strict, so its methods are strict. Conversely, Stream is non-strict, so its methods are non-strict.
So this isn't really related to Iterable at all -- after all, both List and Stream are Iterable. Changing the collection return type can cause all sort of problems -- at the very least, it would make the task of keeping a persistent data structure harder.
On the other hand, there are advantages to delaying certain operations, even on a strict collection. Here are some ways of doing it:
// Get an iterator explicitly, if it's going to be used only once
def print5I(it: Iterable[Int]) = {
var flag = true
it.iterator.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Get a Stream explicitly, if the result will be reused
def print5S(it: Iterable[Int]) = {
var flag = true
it.toStream.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Use a view, which provides non-strictness for some methods
def print5V(it: Iterable[Int]) = {
var flag = true
it.view.filter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}
// Use withFilter, which is explicitly designed to be used as a non-strict filter
def print5W(it: Iterable[Int]) = {
var flag = true
it.withFilter(_ => flag).foreach { i =>
flag = i < 5
println(i)
}
}