When is it okay to use "var" in Scala? - scala

I know that Scala has var (for mutable state) but pure functional programming discourages use of any mutable state and rather focuses on using val for everything.
Coming from an imperative world it's hard to let go of mutable state.
My question is when is it okay to use var in your Scala code ? Can all code really be done using just val. If yes, then why does Scala have vars?

Here are some reasons for vars in Scala:
Scala is a multi-paradigm language, it encourages functional programming, but it leaves the choice to the programmer.
Comptibility: Many Java APIs expose mutable variables.
Performance: Sometimes using a var gives you the best possible performance.
When people say that everything can be done without vars, that is correct in the sense that Scala would still be turing complete without vars. However, it doesn't change anything about the validity of the previous points.

Even from a functional programming point of view, you can use vars (or mutable objects) locally, if they don't leave the scope where they are defined.
For instance, consider this (contrived) function, which returns the size of a list:
def dumbSize( lst: List[Int] ): Int = {
var i = 0
var rest = lst
while( rest != Nil ) {
i += 1
rest = rest.tail
}
i
}
Although this (ugly) function uses vars, it is still pure. There are no side effects and it will always return the same result for a given argument value.
Another example of "mutable-state-encapsulation" is the actor model, where actor state is often mutable.

Related

Is there a reason why assignments in scala evaluate to Unit? [duplicate]

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.

Why not mark val or var variables as private?

Coming from a java background I always mark instance variables as private. I'm learning scala and almost all of the code I have viewed the val/var instances have default (public) access. Why is this the access ? Does it not break information hiding/encapsulation principle ?
It would help it you specified which code, but keep in mind that some example code is in a simplified form to highlight whatever it is that the example is supposed to show you. Since the default access is public, that means that you often get the modifiers left off for simplicity.
That said, since a val is immutable, there's not much harm in leaving it public as long as you recognize that this is now part of the API for your class. That can be perfectly okay:
class DataThingy(data: Array[Double) {
val sum = data.sum
}
Or it can be an implementation detail that you shouldn't expose:
class Statistics(data: Array[Double]) {
val sum = data.sum
val sumOfSquares = data.map(x => x*x).sum
val expectationSquared = (sum * sum)/(data.length*data.length)
val expectationOfSquare = sumOfSquares/data.length
val varianceOfSample = expectationOfSquare - expectationSquared
val standardDeviation = math.sqrt(data.length*varianceOfSample/(data.length-1))
}
Here, we've littered our class with all of the intermediate steps for calculating standard deviation. And this is especially foolish given that this is not the most numerically stable way to calculate standard deviation with floating point numbers.
Rather than merely making all of these private, it is better style, if possible, to use local blocks or private[this] defs to perform the intermediate computations:
val sum = data.sum
val standardDeviation = {
val sumOfSquares = ...
...
math.sqrt(...)
}
or
val sum = data.sum
private[this] def findSdFromSquares(s: Double, ssq: Double) = { ... }
val standardDeviation = findMySD(sum, data.map(x => x*x).sum)
If you need to store a calculation for later use, then private val or private[this] val is the way to go, but if it's just an intermediate step on the computation, the options above are better.
Likewise, there's no harm in exposing a var if it is a part of the interface--a vector coordinate on a mutable vector for instance. But you should make them private (better yet: private[this], if you can!) when it's an implementation detail.
One important difference between Java and Scala here is that in Java you can not replace a public variable with getter and setter methods (or vice versa) without breaking source and binary compatibility. In Scala you can.
So in Java if you have a public variable, the fact that it's a variable will be exposed to the user and if you ever change it, the user has to change his code. In Scala you can replace a public var with a getter and setter method (or a public val with just a getter method) without the user ever knowing the difference. So in that sense no implementation details are exposed.
As an example, let's consider a rectangle class:
class Rectangle(val width: Int, val height:Int) {
val area = width * height
}
Now what happens if we later decide that we don't want the area to be stored as a variable, but rather it should be calculated each time it's called?
In Java the situation would be like this: If we had used a getter method and a private variable, we could just remove the variable and change the getter method to calculate the area instead of using the variable. No changes to user code needed. But since we've used a public variable, we are now forced to break user code :-(
In Scala it's different: we can just change the val to def and that's it. No changes to user code needed.
Actually, some Scala developers tend to use default access too much. But you can find appropriate examples in famous Scala projects(for example, Twitter's Finagle).
On the other hand, creating objects as immutable values is the standard way in Scala. We don't need to hide all the attributes if they're immutable completely.
I'd like to answer the question with a bit more generic approach. I think the answer you are looking for has to do with the design paradigms on which Scala is built. Instead of the classical prodecural / object oriented approach, like you see in Java, functional programming is used to a much higher extend. I cannot cover all the code that you mention of course, but in general (well written) Scala code will not need a lot of mutability.
As pointed out by Rex, val's are immutable, so there are few reasons for them to not be public. But as I see it the immutability is not a goal in itself, but a result of functional programming. So if we consider functions as something like x -> function -> y the function part becomes somewhat of a black box; we don't really care what it does, as long as it does it correctly. As the Haskell Wiki writes:
Purely functional programs typically operate on immutable data. Instead of altering existing values, altered copies are created and the original is preserved.
This also explains the missing closure, since the parts we traditionally wanted to hide away is executed in the functions and thus hidden anyway.
So, to cut things short, I would argue that mutability and closure has become more redundant in Scala. And why clutter things up with getters and setter when it can be avoided?

Why does assignment return Unit? (Why aren't assignments chainable?) [duplicate]

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.

How pure and lazy can Scala be?

This is just one of those "I was wondering..." questions.
Scala has immutable data structures and (optional) lazy vals etc.
How close can a Scala program be to one that is fully pure (in a functional programming sense) and fully lazy (or as Ingo points out, can it be sufficiently non-strict)? What values are unavoidably mutable and what evaluation unavoidably greedy?
Regarding lazyness - currently, passing a parameter to a method is by default strict:
def square(a: Int) = a * a
but you use call-by-name parameters:
def square(a: =>Int) = a * a
but this is not lazy in the sense that it computes the value only once when needed:
scala> square({println("calculating");5})
calculating
calculating
res0: Int = 25
There's been some work into adding lazy method parameters, but it hasn't been integrated yet (the below declaration should print "calculating" from above only once):
def square(lazy a: Int) = a * a
This is one piece that is missing, although you could simulate it with a local lazy val:
def square(ap: =>Int) = {
lazy val a = ap
a * a
}
Regarding mutability - there is nothing holding you back from writing immutable data structures and avoid mutation. You can do this in Java or C as well. In fact, some immutable data structures rely on the lazy primitive to achieve better complexity bounds, but the lazy primitive can be simulated in other languages as well - at the cost of extra syntax and boilerplate.
You can always write immutable data structures, lazy computations and fully pure programs in Scala. The problem is that the Scala programming model allows writing non pure programs as well, so the type checker can't always infer some properties of the program (such as purity) which it could infer given that the programming model was more restrictive.
For example, in a language with pure expressions the a * a in the call-by-name definition above (a: =>Int) could be optimized to evaluate a only once, regardless of the call-by-name semantics. If the language allows side-effects, then such an optimization is not always applicable.
Scala can be as pure and lazy as you like, but a) the compiler won't keep you honest with regards to purity and b) it will take a little extra work to make it lazy. There's nothing too profound about this; you can even write lazy and pure Java code if you really want to (see here if you dare; achieving laziness in Java requires eye-bleeding amounts of nested anonymous inner classes).
Purity
Whereas Haskell tracks impurities via the type system, Scala has chosen not to go that route, and it's difficult to tack that sort of thing on when you haven't made it a goal from the beginning (and also when interoperability with a thoroughly impure language like Java is a major goal of the language).
That said, some believe it's possible and worthwhile to make the effort to document effects in Scala's type system. But I think purity in Scala is best treated as a matter of self-discipline, and you must be perpetually skeptical about the supposed purity of third-party code.
Laziness
Haskell is lazy by default but can be made stricter with some annotations sprinkled in your code... Scala is the opposite: strict by default but with the lazy keyword and by-name parameters you can make it as lazy as you like.
Feel free to keep things immutable. On the other hand, there's no side effect tracking, so you can't enforce or verify it.
As for non-strictness, here's the deal... First, if you choose to go completely non-strict, you'll be forsaking all of Scala's classes. Even Scalaz is not non-strict for the most part. If you are willing to build everything yourself, you can make your methods non-strict and your values lazy.
Next, I wonder if implicit parameters can be non-strict or not, or what would be the consequences of making them non-strict. I don't see a problem, but I could be wrong.
But, most problematic of all, function parameters are strict, and so are closures parameters.
So, while it is theoretically possible to go fully non-strict, it will be incredibly inconvenient.

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.