Is there any "good" code pattern for a Class initializing and populating private mutable Maps, and then exposing them as immutable ones? Or should I just eternally regret my functional misconduct in such cases?
In a certain Class, I am initializing some Maps as mutable ones, as the logic for initializing them does not fit very naturally, in this one case, with a pure mutable creation approach. Or, I was just lazy to model it immutably.
Now, I get Scala ugly code - after all the initialization computation, I copy-convert the mutable Maps into immutable ones (mostly through .toMap). This is already ugly as (1) the code has double the Maps and the double naming feels a bit off, and (2) the conversion lines look more involved than I'd hope for.
Additionally (3), it is to my disliking that the type definitions of the resulting immutable Maps, can only reside at the bottom of the code now, as they can only be declared after the initialization computation (or, can they be defined lazy and move to the top? still not entirely elegant).
Any way to elegantly wrap up around mutable Maps initialization code?
Something like:
scala> class X {
| private val mb = collection.immutable.Map.newBuilder[String, Int]
| def m = mb.result
| mb += ("a" -> 1) // stuff
| }
defined class X
scala> new X().m
res0: scala.collection.immutable.Map[String,Int] = Map(a -> 1)
I think using vars of immutables rather that vals of mutables, evolving the var collections according to my initialization logic, can be the optimal pattern wherever applicable. No duplicate collections, no code to convert from immutable to mutable, clear type definitions at the top of the Class...
However, it is my understanding that this functional way trades-off with run time efficiency, as mutable collections can provide better modification performance when running modification logic on them while building them.
Related
Consider the following:
case class Node(var left: Option[Node], var right: Option[Node])
It's easy to see how you could traverse this, search it, whatever. But now imagine you did this:
val root = Node(None, None)
root.left = root
Now, this is bad, catastrophic. In fact, you type it into a REPL, you'll get a StackOverflow (hey, that would be a good name for a band!) and a stack trace a thousand lines long. If you want to try it, do this:
{ root.left = root }: Unit
to suppress the REPL well-intentioned attempt to print out the results.
But to construct that, I had to specifically give the case-class mutable members, something I would never do in real life. If I use ordinary mutable members, I get a problem with construction. The closest I can come is
case class Node(left: Option[Node], right: Option[Node])
val root: Node = Node(Some(loop), None)
Then root has the rather ugly value Node(Some(null),None), but it's still not cyclic.
So my question is, if a data-structure is transitively immutable (that is, all of its members are either immutable values or references to other data-structures that are themselves transitively immutable), is it guaranteed to be acyclic?
It would be cool if it were.
Yes, it is possible to create cyclic data structures even with purely immutable data structures in a pure, referentially transparent, effect-free language.
The "obvious" solution is to pull out the potentially cyclic references into a separate data structure. For example, if you represent a graph as an adjacency matrix, then you don't need cycles in your data structure to represent cycles in your graph. But that's cheating: every problem can be solved by adding a layer of indirection (except the problem of having too many layers of indirection).
Another cheat would be to circumvent Scala's immutability guarantees from the outside, e.g. on the default Scala-JVM implementation by using Java reflection methods.
It is possible to create actual cyclic references. The technique is called Tying the Knot, and it relies on laziness: you can actually set the reference to an object that you haven't created yet because the reference will be evaluated lazily, by which time the object will have been created. Scala has support for laziness in various forms: lazy vals, by-name parameters, and the now deprecated DelayedInit. Also, you can "fake" laziness using functions or method: wrap the thing you want to make lazy in a function or method which produces the thing, and it won't be created until you call the function or method.
So, the same techniques should be possible in Scala as well.
How about using lazy with call by name ?
scala> class Node(l: => Node, r: => Node, v: Int)
// defined class Node
scala> lazy val root: Node = new Node(root, root, 5)
// root: Node = <lazy>
I have a DAO object which I defined as a case class.
case class StudentDAO(id: Int) {
def getGPA: Double = // Expensive database lookup goes here
def getRank: Int = // Another expensive database operation and computation goes here
def getScoreCard: File = // Expensive file lookup goes here
}
I would naturally make getGPA and getRank and getScoreCard defs and not vals because I don't want them to be computed before they may be used.
What would be the performance impact if I marked these methods as lazy vals instead of defs? The reason I want to make them lazy vals is: I do not want to recompute the rank each time for a Student with id "i".
I am hoping that this will not be marked as duplicate because there are several questions as below which are mostly about differences:
When to use val, def, and lazy val in Scala?
def or val or lazy val for grammar rules?
`def` vs `val` vs `lazy val` evaluation in Scala
Scala Lazy Val Question
This question is mainly aimed towards the expenses (tradeoffs between CPU vs. memory) in making a method a lazy val for costly operations and what would one suggest over other and why?
EDIT: Thank you for the comment #om-nom-nom. I should have been more clear with what I was looking for.
I read here:
Use of lazy val for caching string representation
that string representation of the object is cached (see #Dave Griffith's answer). More precisely I am looking at the impact of Garbage Collection if I made it a lazy val instead of def
Seems pretty straightforward to me:
I don't want them to be computed before they may be
used.
[...]
I do not want to recompute the rank each time for a Student with id "i".
Then use lazy val and that's it.
def is used when the value may change for each call, typically because you pass parameters, val won't change but will be computed right away.
A lazy val for an "ordinary" reference type (e.g., File) has the effect of creating a strong reference the first time it is evaluated. Thus, while it will avoid re-evaluations of an unchanging value, it has the obvious cost of keeping the computed value in memory.
For primitive values (or even lightweight objects, like File), this memory cost usually isn't much of an issue (unless you're holding lots of Student objects in memory). For a heavy reference, though (e.g., a large data structure), you might be better off using a weak reference, some other caching approach, or just computing the value on-demand.
Coming from a java background I always mark instance variables as private. I'm learning scala and almost all of the code I have viewed the val/var instances have default (public) access. Why is this the access ? Does it not break information hiding/encapsulation principle ?
It would help it you specified which code, but keep in mind that some example code is in a simplified form to highlight whatever it is that the example is supposed to show you. Since the default access is public, that means that you often get the modifiers left off for simplicity.
That said, since a val is immutable, there's not much harm in leaving it public as long as you recognize that this is now part of the API for your class. That can be perfectly okay:
class DataThingy(data: Array[Double) {
val sum = data.sum
}
Or it can be an implementation detail that you shouldn't expose:
class Statistics(data: Array[Double]) {
val sum = data.sum
val sumOfSquares = data.map(x => x*x).sum
val expectationSquared = (sum * sum)/(data.length*data.length)
val expectationOfSquare = sumOfSquares/data.length
val varianceOfSample = expectationOfSquare - expectationSquared
val standardDeviation = math.sqrt(data.length*varianceOfSample/(data.length-1))
}
Here, we've littered our class with all of the intermediate steps for calculating standard deviation. And this is especially foolish given that this is not the most numerically stable way to calculate standard deviation with floating point numbers.
Rather than merely making all of these private, it is better style, if possible, to use local blocks or private[this] defs to perform the intermediate computations:
val sum = data.sum
val standardDeviation = {
val sumOfSquares = ...
...
math.sqrt(...)
}
or
val sum = data.sum
private[this] def findSdFromSquares(s: Double, ssq: Double) = { ... }
val standardDeviation = findMySD(sum, data.map(x => x*x).sum)
If you need to store a calculation for later use, then private val or private[this] val is the way to go, but if it's just an intermediate step on the computation, the options above are better.
Likewise, there's no harm in exposing a var if it is a part of the interface--a vector coordinate on a mutable vector for instance. But you should make them private (better yet: private[this], if you can!) when it's an implementation detail.
One important difference between Java and Scala here is that in Java you can not replace a public variable with getter and setter methods (or vice versa) without breaking source and binary compatibility. In Scala you can.
So in Java if you have a public variable, the fact that it's a variable will be exposed to the user and if you ever change it, the user has to change his code. In Scala you can replace a public var with a getter and setter method (or a public val with just a getter method) without the user ever knowing the difference. So in that sense no implementation details are exposed.
As an example, let's consider a rectangle class:
class Rectangle(val width: Int, val height:Int) {
val area = width * height
}
Now what happens if we later decide that we don't want the area to be stored as a variable, but rather it should be calculated each time it's called?
In Java the situation would be like this: If we had used a getter method and a private variable, we could just remove the variable and change the getter method to calculate the area instead of using the variable. No changes to user code needed. But since we've used a public variable, we are now forced to break user code :-(
In Scala it's different: we can just change the val to def and that's it. No changes to user code needed.
Actually, some Scala developers tend to use default access too much. But you can find appropriate examples in famous Scala projects(for example, Twitter's Finagle).
On the other hand, creating objects as immutable values is the standard way in Scala. We don't need to hide all the attributes if they're immutable completely.
I'd like to answer the question with a bit more generic approach. I think the answer you are looking for has to do with the design paradigms on which Scala is built. Instead of the classical prodecural / object oriented approach, like you see in Java, functional programming is used to a much higher extend. I cannot cover all the code that you mention of course, but in general (well written) Scala code will not need a lot of mutability.
As pointed out by Rex, val's are immutable, so there are few reasons for them to not be public. But as I see it the immutability is not a goal in itself, but a result of functional programming. So if we consider functions as something like x -> function -> y the function part becomes somewhat of a black box; we don't really care what it does, as long as it does it correctly. As the Haskell Wiki writes:
Purely functional programs typically operate on immutable data. Instead of altering existing values, altered copies are created and the original is preserved.
This also explains the missing closure, since the parts we traditionally wanted to hide away is executed in the functions and thus hidden anyway.
So, to cut things short, I would argue that mutability and closure has become more redundant in Scala. And why clutter things up with getters and setter when it can be avoided?
My present use case is pretty trivial, either mutable or immutable Map will do the trick.
Have a method that takes an immutable Map, which then calls a 3rd party API method that takes an immutable Map as well
def doFoo(foo: String = "default", params: Map[String, Any] = Map()) {
val newMap =
if(someCondition) params + ("foo" -> foo) else params
api.doSomething(newMap)
}
The Map in question will generally be quite small, at most there might be an embedded List of case class instances, a few thousand entries max. So, again, assume little impact in going immutable in this case (i.e. having essentially 2 instances of the Map via the newMap val copy).
Still, it nags me a bit, copying the map just to get a new map with a few k->v entries tacked onto it.
I could go mutable and params.put("bar", bar), etc. for the entries I want to tack on, and then params.toMap to convert to immutable for the api call, that is an option. but then I have to import and pass around mutable maps, which is a bit of hassle compared to going with Scala's default immutable Map.
So, what are the general guidelines for when it is justified/good practice to use mutable Map over immutable Maps?
Thanks
EDIT
so, it appears that an add operation on an immutable map takes near constant time, confirming #dhg's and #Nicolas's assertion that a full copy is not made, which solves the problem for the concrete case presented.
Depending on the immutable Map implementation, adding a few entries may not actually copy the entire original Map. This is one of the advantages to the immutable data structure approach: Scala will try to get away with copying as little as possible.
This kind of behavior is easiest to see with a List. If I have a val a = List(1,2,3), then that list is stored in memory. However, if I prepend an additional element like val b = 0 :: a, I do get a new 4-element List back, but Scala did not copy the orignal list a. Instead, we just created one new link, called it b, and gave it a pointer to the existing List a.
You can envision strategies like this for other kinds of collections as well. For example, if I add one element to a Map, the collection could simply wrap the existing map, falling back to it when needed, all while providing an API as if it were a single Map.
Using a mutable object is not bad in itself, it becomes bad in a functional programming environment, where you try to avoid side-effects by keeping functions pure and objects immutable.
However, if you create a mutable object inside a function and modify this object, the function is still pure if you don't release a reference to this object outside the function. It is acceptable to have code like:
def buildVector( x: Double, y: Double, z: Double ): Vector[Double] = {
val ary = Array.ofDim[Double]( 3 )
ary( 0 ) = x
ary( 1 ) = y
ary( 2 ) = z
ary.toVector
}
Now, I think this approach is useful/recommended in two cases: (1) Performance, if creating and modifying an immutable object is a bottleneck of your whole application; (2) Code readability, because sometimes it's easier to modify a complex object in place (rather than resorting to lenses, zippers, etc.)
In addition to dhg's answer, you can take a look to the performance of the scala collections. If an add/remove operation doesn't take a linear time, it must do something else than just simply copying the entire structure. (Note that the converse is not true: it's not beacuase it takes linear time that your copying the whole structure)
I like to use collections.maps as the declared parameter types (input or return values) rather than mutable or immutable maps. The Collections maps are immutable interfaces that work for both types of implementations. A consumer method using a map really doesn't need to know about a map implementation or how it was constructed. (It's really none of its business anyway).
If you go with the approach of hiding a map's particular construction (be it mutable or immutable) from the consumers who use it then you're still getting an essentially immutable map downstream. And by using collection.Map as an immutable interface you completely remove all the ".toMap" inefficiency that you would have with consumers written to use immutable.Map typed objects. Having to convert a completely constructed map into another one simply to comply to an interface not supported by the first one really is absolutely unnecessary overhead when you think about it.
I suspect in a few years from now we'll look back at the three separate sets of interfaces (mutable maps, immutable maps, and collections maps) and realize that 99% of the time only 2 are really needed (mutable and collections) and that using the (unfortunately) default immutable map interface really adds a lot of unnecessary overhead for the "Scalable Language".
Is it lack of time, some technical problem or is there a reason why it should not exist?
It's just a missing case that will presumably eventually be filled in. There is no reason not to do it, and in certain cases it would be considerably faster than the immutable tree (since modifications require log(n) object creations with an immutable tree and only 1 with a mutable tree).
Edit: and in fact it was filled in in 2.12.
Mutable TreeMap.
(There is a corresponding Set also.)
Meanwhile you can use the Java TreeMap, which is exactly what you need.
val m = new java.util.TreeMap[String, Int]()
m.put("aa", 2)
m.put("cc", 3)
I assume that the reason is that having a mutable variant doesn't bring a big benefit. There are some cases mentioned in the other answers when a mutable map could be a bit more efficient, for example when replacing an already existing value: A mutable variant would save creation of new nodes, but the complexity would be still O(log n).
If you want to keep a shared reference to the map, you can use ImmutableMapAdaptor which wraps any immutable map into a mutable structure.
You'll also notice that TreeSet doesn't have a mutable equivalent either. It's because they share the common base class RedBlack, and the underlying data structure that keeps the Trees ordered by elements or keys is a red-black tree. I don't know too much about this data structure, but it's pretty complex (insertion and removal are pretty expensive compared to other Maps), so I assume that had something to do with a mutable variant not being included.
Basically, it's probably because the underlying data structure isn't readily mutable so TreeMap isn't. So, to answer your question, it's a technical problem. It can definitely be done though, there's just not much of a use case for it.
There may be performance reasons for a mutable TreeMap, but usually you can use an immutable map in the same way as you would a mutable one. You just have to assign it to a var rather than a val. It would be the same as for HashMap, which has mutable and immutable variants:
val mh = collection.mutable.HashMap[Int, Int]()
var ih = collection.immutable.HashMap[Int, Int]()
mh += (1 -> 2)
ih += (1 -> 2)
mh // scala.collection.mutable.HashMap[Int,Int] = Map(1 -> 2)
ih // scala.collection.immutable.HashMap[Int,Int] = Map(1 -> 2)