Why not mark val or var variables as private? - scala

Coming from a java background I always mark instance variables as private. I'm learning scala and almost all of the code I have viewed the val/var instances have default (public) access. Why is this the access ? Does it not break information hiding/encapsulation principle ?

It would help it you specified which code, but keep in mind that some example code is in a simplified form to highlight whatever it is that the example is supposed to show you. Since the default access is public, that means that you often get the modifiers left off for simplicity.
That said, since a val is immutable, there's not much harm in leaving it public as long as you recognize that this is now part of the API for your class. That can be perfectly okay:
class DataThingy(data: Array[Double) {
val sum = data.sum
}
Or it can be an implementation detail that you shouldn't expose:
class Statistics(data: Array[Double]) {
val sum = data.sum
val sumOfSquares = data.map(x => x*x).sum
val expectationSquared = (sum * sum)/(data.length*data.length)
val expectationOfSquare = sumOfSquares/data.length
val varianceOfSample = expectationOfSquare - expectationSquared
val standardDeviation = math.sqrt(data.length*varianceOfSample/(data.length-1))
}
Here, we've littered our class with all of the intermediate steps for calculating standard deviation. And this is especially foolish given that this is not the most numerically stable way to calculate standard deviation with floating point numbers.
Rather than merely making all of these private, it is better style, if possible, to use local blocks or private[this] defs to perform the intermediate computations:
val sum = data.sum
val standardDeviation = {
val sumOfSquares = ...
...
math.sqrt(...)
}
or
val sum = data.sum
private[this] def findSdFromSquares(s: Double, ssq: Double) = { ... }
val standardDeviation = findMySD(sum, data.map(x => x*x).sum)
If you need to store a calculation for later use, then private val or private[this] val is the way to go, but if it's just an intermediate step on the computation, the options above are better.
Likewise, there's no harm in exposing a var if it is a part of the interface--a vector coordinate on a mutable vector for instance. But you should make them private (better yet: private[this], if you can!) when it's an implementation detail.

One important difference between Java and Scala here is that in Java you can not replace a public variable with getter and setter methods (or vice versa) without breaking source and binary compatibility. In Scala you can.
So in Java if you have a public variable, the fact that it's a variable will be exposed to the user and if you ever change it, the user has to change his code. In Scala you can replace a public var with a getter and setter method (or a public val with just a getter method) without the user ever knowing the difference. So in that sense no implementation details are exposed.
As an example, let's consider a rectangle class:
class Rectangle(val width: Int, val height:Int) {
val area = width * height
}
Now what happens if we later decide that we don't want the area to be stored as a variable, but rather it should be calculated each time it's called?
In Java the situation would be like this: If we had used a getter method and a private variable, we could just remove the variable and change the getter method to calculate the area instead of using the variable. No changes to user code needed. But since we've used a public variable, we are now forced to break user code :-(
In Scala it's different: we can just change the val to def and that's it. No changes to user code needed.

Actually, some Scala developers tend to use default access too much. But you can find appropriate examples in famous Scala projects(for example, Twitter's Finagle).
On the other hand, creating objects as immutable values is the standard way in Scala. We don't need to hide all the attributes if they're immutable completely.

I'd like to answer the question with a bit more generic approach. I think the answer you are looking for has to do with the design paradigms on which Scala is built. Instead of the classical prodecural / object oriented approach, like you see in Java, functional programming is used to a much higher extend. I cannot cover all the code that you mention of course, but in general (well written) Scala code will not need a lot of mutability.
As pointed out by Rex, val's are immutable, so there are few reasons for them to not be public. But as I see it the immutability is not a goal in itself, but a result of functional programming. So if we consider functions as something like x -> function -> y the function part becomes somewhat of a black box; we don't really care what it does, as long as it does it correctly. As the Haskell Wiki writes:
Purely functional programs typically operate on immutable data. Instead of altering existing values, altered copies are created and the original is preserved.
This also explains the missing closure, since the parts we traditionally wanted to hide away is executed in the functions and thus hidden anyway.
So, to cut things short, I would argue that mutability and closure has become more redundant in Scala. And why clutter things up with getters and setter when it can be avoided?

Related

When to use implicit parameters

I've been using Scala at work, and I have a question related to implicit parameters.
Often I've seen executionContext defined in method definitions and also in class definitions.
At the same time I've seen classes that accepts case classes that contain configuration data (timeout, adapter, port, etc.) as regular parameters.
My question is why when passing configuration this parameter is not defined as implicit?
Or the other way around what if executionContext would be defined as a regular parameter?
I'm trying to understand when to use implicit parameter and when not to use them.
EDIT: maybe the example of passing a case class is not the best example, it was the first idea that comes to my mind
Conceptually, implicits are something "external" to the application logic, and explicit parameters are ... well ... explicit.
Consider a function def f(x: Double): Double = x*x
It is a pure function that transforms a given real number into another real number. It makes sense for x to be an explicit parameter, as it is an intrinsic part of what this function is.
Now, suppose, you were implementing some sort of approximate algorithm for multiplication, and wanted to control the precision with which you function computes the answer.
You could do def f(x: Double, precision: Int): Double = ???. It would work, but is inconvenient and kinda clumsy:
Function definition no longer expresses the conceptual "nature" of the function being a pure transformation on the set of real numbers
It makes it complicated at the call site, because everyone using your function must now be aware of this additional parameter to pass around (imagine, you are writing a library for non–engineer math majors to use, they understand abstract transformations and complex formulas, but could care less about numeric precision: how often do you think about precision when you need to compute an area of a square?).
It also makes existing code harder to read and modify
So, to make it prettier, you can do def f(x: Double)(implicit precision: Int) = ???. This has an advantage of saying exactly what you want: "I have a transformation double => double, that will use the implied precision when the actual result is computed). Those math majors can now write their abstract formulas the way they are used to: val area = square(x) without polluting their logic with annoying configurations they don't really care about.
When to use this exactly is, certainly, a question of opinion and taste (which is expressly forbidden on SO). Someone can certainly argue about the above example, that precision is actually a part of the transformation definition, because 5.429 and 5.54289 (results of f(2.33)(3) and f(2.33)(4) respectively) are two different numbers.
So, in the end of the day, you just gotta use your judgement and your common sense to make a decision for every case you come across.
When using existing libraries, there is another consideration. Consider:
def foo(f: Future[Int], ec: ExecutionContext) =
f.map { x => x*x }(ec)
.map { _.toString } (ec)
.foreach(println)(ec)
This would look a lot nicer and less messy if you made ec implicit, regardless of where you stand philosophically on whether to consider it a part of your transformation or not:
def foo(f: Future[Int])(implicit ec: ExecutionContext) =
f.map { x => x*x }.map(_.toString).foreach(println)
Implicits can be used when:
you need only one value of some type
it is unambiguous how such value would be defined
this includes both manual definition as well as using metaprogramming to generate the value based on e.g. how its type is defined
Futures and Akka decided that passing some "globals" as implicits is a reasonable use case, so they would pass as implicits:
ExecutionContext
ActorSystem, Materializer
various configs like Timeout
in general things which you don't want to be put into some static field, but which are passed around everywhere.
However, the rest of Scala world would solve this issue by using some abstraction that would pass these things under the hood, some sort of builders, via constructors, abstractions over (dependencies) => result functions, etc.
E.g. cats.effect.IO don't need to pass ExecutionContext around because it passes its scheduler around when you run it. Only when you want to explicitly change the pool things are being run on you have to use some method. In Monix running things also require you to pass Scheduler at the end, when whole computation is composed. So both approached let you give up on passing around all these ExecutionContexts. In case of Future it is necessary because you need to have control over thread pools, but you also evaluate things eagerly, and putting ec (futureA.flatMap(f)(ec)) manually would break for-comprehension.
As a result, outside Akka ecosystem and raw Futures, are more often used to carry around type-classes, as a mean to decouple business logic from particular implementation, allow adding support for new types without modifying code that uses these implementations, and so on. (There are tons of examples of type-classes in Scala so I'll skip it here).
Usually, when I read about people using implicits to pass configs around, it is just a matter of time before it ends up with grief. Akka and EC kind of requires them but you should just pass configs explicitly. You can group them into case classes to pass bunch of them around and it is not that much of an issue. You can also put all things required as implicits explicitly into one place and do:
case class Configs(dbEX: EC, mapEC: EC)
class SomeBehavior(configs: Configs) {
def someAction = {
if (...) {
implicit val ec: EC = configs.dbEC
...
} else {
implicit val ec: EC = configs.mapEC
...
}
}
}
to make them implicit only in the place that needs them. A good role of thumb is: do you care if there is something passed around that you don't see right in the code? Usually, the answer is, yes you do, you would prefer to see it, with only exceptions being cases when it would be somewhat obvious where does the value come from, or if you kinda knew that the value would be ok and you didn't bother where it came from.
There are a multitude of use-cases of implicit in Scala: under the hood, they boil down to leveraging the compiler's implicit resolution mechanism to fill in things that might not have explicitly been mentioned, but the use-cases are divergent enough that in Scala 3, each use-case (of those that survive into Scala 3...) gets encoded with a different keyword.
In the case of the execution context, implicit arguments are being used to mimic dynamic scope in a language which is normally statically scoped. The primary win from doing this is that it allows behavior further down the call stack to be decided-upon much further up the call stack without having to always explicitly pass on the behavior through the intervening layers of the stack (while providing a way for those intervening layers to cleanly force a different behavior).
Historically, a major example of this was for things like numeric precision. Many numeric operations end up being implemented through iterated refinement (e.g. when square-root was implemented in software, it might be implemented using Newton's method), which means there's a trade-off between speed of calculation and precision (suggesting accuracy). With dynamic scoping, there's a neat way to accomplish this: a global variable for the desired level of precision in mathematical results. Your numeric routine checks the value of that variable and governs itself accordingly. The difference from globals in a statically-scoped language is that when A calls B which calls C, if A sets the value of x to 1 and B sets it to 2, x will be 2 when checked in C or B, but once B returns to A, x will once again be 1 (in dynamically scoped languages, you can think of a global variable as really being a name for a stack of values, and the language implementation automatically pops the stack as appropriate).
Dynamic scoping was once fairly popular (especially so in Lisps before the mid/late 1970s); nowadays the only places you really see it are in Bourne shells (including bash), Emacs Lisp; while some languages (Perl and Common Lisp are probably the two main examples) are hybrids: a variable gets declared in a special way to make it dynamically or statically scoped. Static scoping has pretty clearly won: it's easier for the language implementation or the programmer to reason about.
The cost of that ease is that, in our numeric computation example, we end up with something like the following:
def newtonSqrt(x: Double, precision: Int): Double = ???
/** Calculates the length of the hypotenuse of a right triangle with legs of given lengths
*/
def hypotenuse(x: Double, y: Double, precision: Int): Double =
newtonSqrt(x*x + y*y, precision)
Thankfully, Scala supports default arguments, so we avoid having versions that use a default precision, too. Arguably, the precision is exposing an implementation detail (the fact that our calculations aren't necessarily perfectly mathematically accurate): the important thing is that the length of the hypotenuse is the square root of the sum of the squares of the legs.
In Scala, we can make the precision implicit:
// DON'T ACTUALLY PASS AN INT IMPLICITLY!!!!!!
def newtonSqrt(x: Double)(implicit precision: Int): Double = ???
def hypotenuse(x: Double, y: Double)(implicit precision: Int): Double =
newtonSqrt(x*x, y*y)
(It's actually really bad to ever pass a primitive or any type which could plausibly be used for something other than describing the behavior in question through the implicit mechanism: I'm doing it here for didactic clarity).
The compiler will effectively translate newtonSqrt(x*x + y*y) to (something very similar to) newtonSqrt(x*x + y*y, precision). Now callers to hypotenuse can decide to fix precision via an implicit val or to defer the choice to their callers by adding the implicit to their signature.
Dynamic scoping has long been controversial, so it's no surprise that even the constrained dynamic scoping this usage of implicit embeds is controversial. In Scala's case, it doesn't help that in many cases the tooling throws up its hands when it comes to helping you figure out implicits: most of the really furious compiler errors one encounters are related to missing implicits or collisions, and tracing to figure out which values are in the implicit scope at any time is not something the tooling has a history of helping people with. Thus there are many developers who have decided that explicitly threading through configuration is superior to using implicits.
It's largely a matter of taste and the situation whether this sort of behavior description is best passed implicitly or explicitly (and it's worth noting that the type-class pattern, especially without a hard requirement for coherence (that there be one and only one possible way to describe the behavior) as is typical in Scala, is just a special case of this behavior description).
I should also note that it isn't a binary choice between bundling a few settings into a case class vs. passing them implicitly: you can do both:
case class ProcessSettings(sys: ActorSystem, ec: ExecutionContext)
object ProcessSettings {
implicit def implicitly(implicit sys: ActorSystem, ec: ExecutionContext): ProcessSettings =
ProcessSettings(sys, ec)
}
def doStuff(x: SomeInput)(implicit settings: ProcessSettings)

Can Try be lazy or eager in Scala?

AFAIK, Iterator.map is lazy while Vector.map is eager, basically because they are different types of monads.
I would like to know if there is any chance of having a EagerTry and LazyTry that behave just like the current Try, but with the latter (LazyTry) delaying the execution of the closure passed until the result is needed (if it is needed).
Please note that declaring stuff as lazy doesn't work quite well in Scala, in particular it works for a given scope. An alternative exists when passing parameters (parameters by name). The question is how to achieve lazy behaviour when returning (lazy) values to an outer scope. Option is basically a collection of length 0 or 1, this would be an equivalent case for lazy collections (Iterator, Sequence) but limited to length 0 or 1 (like Option and Either). I'm particularly interested in Try, i.e. using LazyTry exactly as Try would be used. I guess this should be analogous in other cases (Option and Either).
Please note that we already have EagerTry, as the current standard Try is eager. Unfortunately, the class is sealed, therefore, to have eager and lazy versions of the same class we would need to define the three of them and implement two of them (as opposed to defining and implementing one). The point is returning a Try without other software layers worrying about the time of execution of that code, i.e. abstraction.
Yes, it wouldn't be hard to write LazyTry. One possible approach:
sealed class LazyTry[A](block: => A) {
// the only place block is used
private lazy val underlying: Try[A] = Try(block)
def get = underlying.get
def isSuccess = underlying.isSuccess
...
}
object LazyTry {
def apply[A](block: => A): LazyTry[A] = new LazyTry[A](block)
...
}
Note that you don't have LazySuccess and LazyFailure, since you don't know which class to use before running block.

Scala (method / function).hashCode static value?

I' was curious if the hashCode() function returns always the same value for a lambda or something in Scala?
My tests have shown to me some static value that does not change even over builds. Is this intended behavior or may it change in future?
If this was some static behavior it would help me a lot building my library.
EDIT:
Let's take this source code:
object Main {
def main(args: Array[String]): Unit = {
val x = (s: String) => 1
val y = (s: String) => 2
println(x.hashCode())
println(y.hashCode())
}
}
It's output on console is for me always 1792393294 and 226170135.
What I'm currently doing is implementing a parser combinator library which I implemented in several languages. And I need to know when wrapper classes are the same (e.g. the underlying functions are the same) so I can implement something like a call stack, which I need to parse as far as possible on failure but prevent endless recursion on error.
Thanks in advance!
The default implementation for hashCode (at least in the Oracle JVM) is in terms of the (initial) memory address of that particular object. So if your program has constructed exactly the same sized objects in exactly the same order before constructing that object, it will in practice return the same value every time.
But this is not at all reliable; most programs do not do exactly the same things every time they run. As soon as you're doing something like responding to user input, that perfect reproducibility will disappear - e.g. maybe you sometimes add enough entries to a HashMap to trigger an enlargement of the table, and sometimes not. And if you construct the same value later in the program, it will of course have a different address; try doing
val z = (s: String) => 1
and observe that it will have a different hashCode from x. Not to mention that the numbers may well be different across different JVMs, different versions of the same JVM, or even when the same JVM is launched with a different -Xms setting.
Computers are often a lot more deterministic in practice than in theory. But this is not the kind of thing that's specified to happen, and certainly not something to rely on in your programs.

Scala lazy collection growth

This question is a bit more theoretical.
I have an object which holds a private mutable list or map, whose growth is append only. I believe I could argue that the object itself is functional, being functionally transparent and to all appearances, immutable. So for example I have
import scala.collection.mutable.Map
case class Foo(bar:String)
object FooContainer {
private val foos = Map.empty[String, Foo]
def getFoo(fooName:String) = foos.getOrElseUpdate(fooName, Foo(fooName))
}
I could do a similar case with lists. Now I imagine to be truly functional, I would need to ensure thread safety, which I suppose I could do simply using locks, synchronise or atomic references. My questions being is this all necessary & sufficient, is it a common practice, are there established patterns for this behaviour?
I would say that "functional" is really the wrong term. The exact example shown above could be called "pure" in the sense that its output is only ever defined by its input, thus not having any (visible) side-effects. In reality it is impure though, since it has hidden internal state.
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices. Wikipedia
The apparent impurity vanishes though, if you make the Foo class mutable:
case class Foo(bar:String) {
private var mutable:Int = 1
def setFoo(x:Int) = mutable = x
def foo = mutable
}
The mutability of the Foo class results in getFoo being impure:
scala> FooContainer.getFoo("test").foo
res5: Int = 1
scala> FooContainer.getFoo("bla").setFoo(5)
scala> FooContainer.getFoo("bla").foo
res7: Int = 5
In order for the function to be apparently pure, FooContainer.getFoo("bla").foo must always return the same value. Therefore, the "purity" of the described construct is fragile at best.
As a general rule, you're better off with a var holding an immutable collection type which is replaced when updated than you are with a val holding a mutable type. The former never deals in values that may change after being created (other than the class which holds the evolving value itself). Then you only need to be moderately careful about when updated values held by the var are published to it, but since updating a reference is generally atomic (apart from things like storing a long or a double to RAM on a 32-bit machine), there's little risk.

Configuration data in Scala -- should I use the Reader monad?

How do I create a properly functional configurable object in Scala? I have watched Tony Morris' video on the Reader monad and I'm still unable to connect the dots.
I have a hard-coded list of Client objects:
class Client(name : String, age : Int){ /* etc */}
object Client{
//Horrible!
val clients = List(Client("Bob", 20), Client("Cindy", 30))
}
I want Client.clients to be determined at runtime, with the flexibility of either reading it from a properties file or from a database. In the Java world I'd define an interface, implement the two types of source, and use DI to assign a class variable:
trait ConfigSource {
def clients : List[Client]
}
object ConfigFileSource extends ConfigSource {
override def clients = buildClientsFromProperties(Properties("clients.properties"))
//...etc, read properties files
}
object DatabaseSource extends ConfigSource { /* etc */ }
object Client {
#Resource("configuration_source")
private var config : ConfigSource = _ //Inject it at runtime
val clients = config.clients
}
This seems like a pretty clean solution to me (not a lot of code, clear intent), but that var does jump out (OTOH, it doesn't seem to me really troublesome, since I know it will be injected once-and-only-once).
What would the Reader monad look like in this situation and, explain it to me like I'm 5, what are its advantages?
Let's start with a simple, superficial difference between your approach and the Reader approach, which is that you no longer need to hang onto config anywhere at all. Let's say you define the following vaguely clever type synonym:
type Configured[A] = ConfigSource => A
Now, if I ever need a ConfigSource for some function, say a function that gets the n'th client in the list, I can declare that function as "configured":
def nthClient(n: Int): Configured[Client] = {
config => config.clients(n)
}
So we're essentially pulling a config out of thin air, any time we need one! Smells like dependency injection, right? Now let's say we want the ages of the first, second and third clients in the list (assuming they exist):
def ages: Configured[(Int, Int, Int)] =
for {
a0 <- nthClient(0)
a1 <- nthClient(1)
a2 <- nthClient(2)
} yield (a0.age, a1.age, a2.age)
For this, of course, you need some appropriate definition of map and flatMap. I won't get into that here, but will simply say that Scalaz (or Rúnar's awesome NEScala talk, or Tony's which you've seen already) gives you all you need.
The important point here is that the ConfigSource dependency and its so-called injection are mostly hidden. The only "hint" that we can see here is that ages is of type Configured[(Int, Int, Int)] rather than simply (Int, Int, Int). We didn't need to explicitly reference config anywhere.
As an aside, this is the way I almost always like to think about monads: they hide their effect so it's not polluting the flow of your code, while explicitly declaring the effect in the type signature. In other words, you needn't repeat yourself too much: you say "hey, this function deals with effect X" in the function's return type, and don't mess with it any further.
In this example, of course the effect is to read from some fixed environment. Another monadic effect you might be familiar with include error-handling: we can say that Option hides error-handling logic while making the possibility of errors explicit in your method's type. Or, sort of the opposite of reading, the Writer monad hides the thing we're writing to while making its presence explicit in the type system.
Now finally, just as we normally need to bootstrap a DI framework (somewhere outside our usual flow of control, such as in an XML file), we also need to bootstrap this curious monad. Surely we'll have some logical entry point to our code, such as:
def run: Configured[Unit] = // ...
It ends up being pretty simple: since Configured[A] is just a type synonym for the function ConfigSource => A, we can just apply the function to its "environment":
run(ConfigFileSource)
// or
run(DatabaseSource)
Ta-da! So, contrasting with the traditional Java-style DI approach, we don't have any "magic" occurring here. The only magic, as it were, is encapsulated in the definition of our Configured type and the way it behaves as a monad. Most importantly, the type system keeps us honest about which "realm" dependency injection is occurring in: anything with type Configured[...] is in the DI world, and anything without it is not. We simply don't get this in old-school DI, where everything is potentially managed by the magic, so you don't really know which portions of your code are safe to reuse outside of a DI framework (for example, within your unit tests, or in some other project entirely).
update: I wrote up a blog post which explains Reader in greater detail.