Scala lazy collection growth - scala

This question is a bit more theoretical.
I have an object which holds a private mutable list or map, whose growth is append only. I believe I could argue that the object itself is functional, being functionally transparent and to all appearances, immutable. So for example I have
import scala.collection.mutable.Map
case class Foo(bar:String)
object FooContainer {
private val foos = Map.empty[String, Foo]
def getFoo(fooName:String) = foos.getOrElseUpdate(fooName, Foo(fooName))
}
I could do a similar case with lists. Now I imagine to be truly functional, I would need to ensure thread safety, which I suppose I could do simply using locks, synchronise or atomic references. My questions being is this all necessary & sufficient, is it a common practice, are there established patterns for this behaviour?

I would say that "functional" is really the wrong term. The exact example shown above could be called "pure" in the sense that its output is only ever defined by its input, thus not having any (visible) side-effects. In reality it is impure though, since it has hidden internal state.
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices. Wikipedia
The apparent impurity vanishes though, if you make the Foo class mutable:
case class Foo(bar:String) {
private var mutable:Int = 1
def setFoo(x:Int) = mutable = x
def foo = mutable
}
The mutability of the Foo class results in getFoo being impure:
scala> FooContainer.getFoo("test").foo
res5: Int = 1
scala> FooContainer.getFoo("bla").setFoo(5)
scala> FooContainer.getFoo("bla").foo
res7: Int = 5
In order for the function to be apparently pure, FooContainer.getFoo("bla").foo must always return the same value. Therefore, the "purity" of the described construct is fragile at best.

As a general rule, you're better off with a var holding an immutable collection type which is replaced when updated than you are with a val holding a mutable type. The former never deals in values that may change after being created (other than the class which holds the evolving value itself). Then you only need to be moderately careful about when updated values held by the var are published to it, but since updating a reference is generally atomic (apart from things like storing a long or a double to RAM on a 32-bit machine), there's little risk.

Related

Is it possible for a structure made of immutable types to have a cycle?

Consider the following:
case class Node(var left: Option[Node], var right: Option[Node])
It's easy to see how you could traverse this, search it, whatever. But now imagine you did this:
val root = Node(None, None)
root.left = root
Now, this is bad, catastrophic. In fact, you type it into a REPL, you'll get a StackOverflow (hey, that would be a good name for a band!) and a stack trace a thousand lines long. If you want to try it, do this:
{ root.left = root }: Unit
to suppress the REPL well-intentioned attempt to print out the results.
But to construct that, I had to specifically give the case-class mutable members, something I would never do in real life. If I use ordinary mutable members, I get a problem with construction. The closest I can come is
case class Node(left: Option[Node], right: Option[Node])
val root: Node = Node(Some(loop), None)
Then root has the rather ugly value Node(Some(null),None), but it's still not cyclic.
So my question is, if a data-structure is transitively immutable (that is, all of its members are either immutable values or references to other data-structures that are themselves transitively immutable), is it guaranteed to be acyclic?
It would be cool if it were.
Yes, it is possible to create cyclic data structures even with purely immutable data structures in a pure, referentially transparent, effect-free language.
The "obvious" solution is to pull out the potentially cyclic references into a separate data structure. For example, if you represent a graph as an adjacency matrix, then you don't need cycles in your data structure to represent cycles in your graph. But that's cheating: every problem can be solved by adding a layer of indirection (except the problem of having too many layers of indirection).
Another cheat would be to circumvent Scala's immutability guarantees from the outside, e.g. on the default Scala-JVM implementation by using Java reflection methods.
It is possible to create actual cyclic references. The technique is called Tying the Knot, and it relies on laziness: you can actually set the reference to an object that you haven't created yet because the reference will be evaluated lazily, by which time the object will have been created. Scala has support for laziness in various forms: lazy vals, by-name parameters, and the now deprecated DelayedInit. Also, you can "fake" laziness using functions or method: wrap the thing you want to make lazy in a function or method which produces the thing, and it won't be created until you call the function or method.
So, the same techniques should be possible in Scala as well.
How about using lazy with call by name ?
scala> class Node(l: => Node, r: => Node, v: Int)
// defined class Node
scala> lazy val root: Node = new Node(root, root, 5)
// root: Node = <lazy>

Can Try be lazy or eager in Scala?

AFAIK, Iterator.map is lazy while Vector.map is eager, basically because they are different types of monads.
I would like to know if there is any chance of having a EagerTry and LazyTry that behave just like the current Try, but with the latter (LazyTry) delaying the execution of the closure passed until the result is needed (if it is needed).
Please note that declaring stuff as lazy doesn't work quite well in Scala, in particular it works for a given scope. An alternative exists when passing parameters (parameters by name). The question is how to achieve lazy behaviour when returning (lazy) values to an outer scope. Option is basically a collection of length 0 or 1, this would be an equivalent case for lazy collections (Iterator, Sequence) but limited to length 0 or 1 (like Option and Either). I'm particularly interested in Try, i.e. using LazyTry exactly as Try would be used. I guess this should be analogous in other cases (Option and Either).
Please note that we already have EagerTry, as the current standard Try is eager. Unfortunately, the class is sealed, therefore, to have eager and lazy versions of the same class we would need to define the three of them and implement two of them (as opposed to defining and implementing one). The point is returning a Try without other software layers worrying about the time of execution of that code, i.e. abstraction.
Yes, it wouldn't be hard to write LazyTry. One possible approach:
sealed class LazyTry[A](block: => A) {
// the only place block is used
private lazy val underlying: Try[A] = Try(block)
def get = underlying.get
def isSuccess = underlying.isSuccess
...
}
object LazyTry {
def apply[A](block: => A): LazyTry[A] = new LazyTry[A](block)
...
}
Note that you don't have LazySuccess and LazyFailure, since you don't know which class to use before running block.

def vs lazy val in case class

I have a DAO object which I defined as a case class.
case class StudentDAO(id: Int) {
def getGPA: Double = // Expensive database lookup goes here
def getRank: Int = // Another expensive database operation and computation goes here
def getScoreCard: File = // Expensive file lookup goes here
}
I would naturally make getGPA and getRank and getScoreCard defs and not vals because I don't want them to be computed before they may be used.
What would be the performance impact if I marked these methods as lazy vals instead of defs? The reason I want to make them lazy vals is: I do not want to recompute the rank each time for a Student with id "i".
I am hoping that this will not be marked as duplicate because there are several questions as below which are mostly about differences:
When to use val, def, and lazy val in Scala?
def or val or lazy val for grammar rules?
`def` vs `val` vs `lazy val` evaluation in Scala
Scala Lazy Val Question
This question is mainly aimed towards the expenses (tradeoffs between CPU vs. memory) in making a method a lazy val for costly operations and what would one suggest over other and why?
EDIT: Thank you for the comment #om-nom-nom. I should have been more clear with what I was looking for.
I read here:
Use of lazy val for caching string representation
that string representation of the object is cached (see #Dave Griffith's answer). More precisely I am looking at the impact of Garbage Collection if I made it a lazy val instead of def
Seems pretty straightforward to me:
I don't want them to be computed before they may be
used.
[...]
I do not want to recompute the rank each time for a Student with id "i".
Then use lazy val and that's it.
def is used when the value may change for each call, typically because you pass parameters, val won't change but will be computed right away.
A lazy val for an "ordinary" reference type (e.g., File) has the effect of creating a strong reference the first time it is evaluated. Thus, while it will avoid re-evaluations of an unchanging value, it has the obvious cost of keeping the computed value in memory.
For primitive values (or even lightweight objects, like File), this memory cost usually isn't much of an issue (unless you're holding lots of Student objects in memory). For a heavy reference, though (e.g., a large data structure), you might be better off using a weak reference, some other caching approach, or just computing the value on-demand.

Why not mark val or var variables as private?

Coming from a java background I always mark instance variables as private. I'm learning scala and almost all of the code I have viewed the val/var instances have default (public) access. Why is this the access ? Does it not break information hiding/encapsulation principle ?
It would help it you specified which code, but keep in mind that some example code is in a simplified form to highlight whatever it is that the example is supposed to show you. Since the default access is public, that means that you often get the modifiers left off for simplicity.
That said, since a val is immutable, there's not much harm in leaving it public as long as you recognize that this is now part of the API for your class. That can be perfectly okay:
class DataThingy(data: Array[Double) {
val sum = data.sum
}
Or it can be an implementation detail that you shouldn't expose:
class Statistics(data: Array[Double]) {
val sum = data.sum
val sumOfSquares = data.map(x => x*x).sum
val expectationSquared = (sum * sum)/(data.length*data.length)
val expectationOfSquare = sumOfSquares/data.length
val varianceOfSample = expectationOfSquare - expectationSquared
val standardDeviation = math.sqrt(data.length*varianceOfSample/(data.length-1))
}
Here, we've littered our class with all of the intermediate steps for calculating standard deviation. And this is especially foolish given that this is not the most numerically stable way to calculate standard deviation with floating point numbers.
Rather than merely making all of these private, it is better style, if possible, to use local blocks or private[this] defs to perform the intermediate computations:
val sum = data.sum
val standardDeviation = {
val sumOfSquares = ...
...
math.sqrt(...)
}
or
val sum = data.sum
private[this] def findSdFromSquares(s: Double, ssq: Double) = { ... }
val standardDeviation = findMySD(sum, data.map(x => x*x).sum)
If you need to store a calculation for later use, then private val or private[this] val is the way to go, but if it's just an intermediate step on the computation, the options above are better.
Likewise, there's no harm in exposing a var if it is a part of the interface--a vector coordinate on a mutable vector for instance. But you should make them private (better yet: private[this], if you can!) when it's an implementation detail.
One important difference between Java and Scala here is that in Java you can not replace a public variable with getter and setter methods (or vice versa) without breaking source and binary compatibility. In Scala you can.
So in Java if you have a public variable, the fact that it's a variable will be exposed to the user and if you ever change it, the user has to change his code. In Scala you can replace a public var with a getter and setter method (or a public val with just a getter method) without the user ever knowing the difference. So in that sense no implementation details are exposed.
As an example, let's consider a rectangle class:
class Rectangle(val width: Int, val height:Int) {
val area = width * height
}
Now what happens if we later decide that we don't want the area to be stored as a variable, but rather it should be calculated each time it's called?
In Java the situation would be like this: If we had used a getter method and a private variable, we could just remove the variable and change the getter method to calculate the area instead of using the variable. No changes to user code needed. But since we've used a public variable, we are now forced to break user code :-(
In Scala it's different: we can just change the val to def and that's it. No changes to user code needed.
Actually, some Scala developers tend to use default access too much. But you can find appropriate examples in famous Scala projects(for example, Twitter's Finagle).
On the other hand, creating objects as immutable values is the standard way in Scala. We don't need to hide all the attributes if they're immutable completely.
I'd like to answer the question with a bit more generic approach. I think the answer you are looking for has to do with the design paradigms on which Scala is built. Instead of the classical prodecural / object oriented approach, like you see in Java, functional programming is used to a much higher extend. I cannot cover all the code that you mention of course, but in general (well written) Scala code will not need a lot of mutability.
As pointed out by Rex, val's are immutable, so there are few reasons for them to not be public. But as I see it the immutability is not a goal in itself, but a result of functional programming. So if we consider functions as something like x -> function -> y the function part becomes somewhat of a black box; we don't really care what it does, as long as it does it correctly. As the Haskell Wiki writes:
Purely functional programs typically operate on immutable data. Instead of altering existing values, altered copies are created and the original is preserved.
This also explains the missing closure, since the parts we traditionally wanted to hide away is executed in the functions and thus hidden anyway.
So, to cut things short, I would argue that mutability and closure has become more redundant in Scala. And why clutter things up with getters and setter when it can be avoided?

val-mutable versus var-immutable in Scala

Are there any guidelines in Scala on when to use val with a mutable collection versus using var with an immutable collection? Or should you really aim for val with an immutable collection?
The fact that there are both types of collection gives me a lot of choice, and often I don't
know how to make that choice.
Pretty common question, this one. The hard thing is finding the duplicates.
You should strive for referential transparency. What that means is that, if I have an expression "e", I could make a val x = e, and replace e with x. This is the property that mutability break. Whenever you need to make a design decision, maximize for referential transparency.
As a practical matter, a method-local var is the safest var that exists, since it doesn't escape the method. If the method is short, even better. If it isn't, try to reduce it by extracting other methods.
On the other hand, a mutable collection has the potential to escape, even if it doesn't. When changing code, you might then want to pass it to other methods, or return it. That's the kind of thing that breaks referential transparency.
On an object (a field), pretty much the same thing happens, but with more dire consequences. Either way the object will have state and, therefore, break referential transparency. But having a mutable collection means even the object itself might lose control of who's changing it.
If you work with immutable collections and you need to "modify" them, for example, add elements to them in a loop, then you have to use vars because you need to store the resulting collection somewhere. If you only read from immutable collections, then use vals.
In general, make sure that you don't confuse references and objects. vals are immutable references (constant pointers in C). That is, when you use val x = new MutableFoo(), you'll be able to change the object that x points to, but you won't be able to change to which object x points. The opposite holds if you use var x = new ImmutableFoo(). Picking up my initial advice: if you don't need to change to which object a reference points, use vals.
The best way to answer this is with an example. Suppose we have some process simply collecting numbers for some reason. We wish to log these numbers, and will send the collection to another process to do this.
Of course, we are still collecting numbers after we send the collection to the logger. And let's say there is some overhead in the logging process that delays the actual logging. Hopefully you can see where this is going.
If we store this collection in a mutable val, (mutable because we are continuously adding to it), this means that the process doing the logging will be looking at the same object that's still being updated by our collection process. That collection may be updated at any time, and so when it's time to log we may not actually be logging the collection we sent.
If we use an immutable var, we send an immutable data structure to the logger. When we add more numbers to our collection, we will be replacing our var with a new immutable data structure. This doesn't mean collection sent to the logger is replaced! It's still referencing the collection it was sent. So our logger will indeed log the collection it received.
I think the examples in this blog post will shed more light, as the question of which combo to use becomes even more important in concurrency scenarios: importance of immutability for concurrency. And while we're at it, note the preferred use of synchronised vs #volatile vs something like AtomicReference: three tools
var immutable vs. val mutable
In addition to many excellent answers to this question. Here is a simple example, that illustrates potential dangers of val mutable:
Mutable objects can be modified inside methods, that take them as parameters, while reassignment is not allowed.
import scala.collection.mutable.ArrayBuffer
object MyObject {
def main(args: Array[String]) {
val a = ArrayBuffer(1,2,3,4)
silly(a)
println(a) // a has been modified here
}
def silly(a: ArrayBuffer[Int]): Unit = {
a += 10
println(s"length: ${a.length}")
}
}
Result:
length: 5
ArrayBuffer(1, 2, 3, 4, 10)
Something like this cannot happen with var immutable, because reassignment is not allowed:
object MyObject {
def main(args: Array[String]) {
var v = Vector(1,2,3,4)
silly(v)
println(v)
}
def silly(v: Vector[Int]): Unit = {
v = v :+ 10 // This line is not valid
println(s"length of v: ${v.length}")
}
}
Results in:
error: reassignment to val
Since function parameters are treated as val this reassignment is not allowed.