Scala (method / function).hashCode static value? - scala

I' was curious if the hashCode() function returns always the same value for a lambda or something in Scala?
My tests have shown to me some static value that does not change even over builds. Is this intended behavior or may it change in future?
If this was some static behavior it would help me a lot building my library.
EDIT:
Let's take this source code:
object Main {
def main(args: Array[String]): Unit = {
val x = (s: String) => 1
val y = (s: String) => 2
println(x.hashCode())
println(y.hashCode())
}
}
It's output on console is for me always 1792393294 and 226170135.
What I'm currently doing is implementing a parser combinator library which I implemented in several languages. And I need to know when wrapper classes are the same (e.g. the underlying functions are the same) so I can implement something like a call stack, which I need to parse as far as possible on failure but prevent endless recursion on error.
Thanks in advance!

The default implementation for hashCode (at least in the Oracle JVM) is in terms of the (initial) memory address of that particular object. So if your program has constructed exactly the same sized objects in exactly the same order before constructing that object, it will in practice return the same value every time.
But this is not at all reliable; most programs do not do exactly the same things every time they run. As soon as you're doing something like responding to user input, that perfect reproducibility will disappear - e.g. maybe you sometimes add enough entries to a HashMap to trigger an enlargement of the table, and sometimes not. And if you construct the same value later in the program, it will of course have a different address; try doing
val z = (s: String) => 1
and observe that it will have a different hashCode from x. Not to mention that the numbers may well be different across different JVMs, different versions of the same JVM, or even when the same JVM is launched with a different -Xms setting.
Computers are often a lot more deterministic in practice than in theory. But this is not the kind of thing that's specified to happen, and certainly not something to rely on in your programs.

Related

When to use implicit parameters

I've been using Scala at work, and I have a question related to implicit parameters.
Often I've seen executionContext defined in method definitions and also in class definitions.
At the same time I've seen classes that accepts case classes that contain configuration data (timeout, adapter, port, etc.) as regular parameters.
My question is why when passing configuration this parameter is not defined as implicit?
Or the other way around what if executionContext would be defined as a regular parameter?
I'm trying to understand when to use implicit parameter and when not to use them.
EDIT: maybe the example of passing a case class is not the best example, it was the first idea that comes to my mind
Conceptually, implicits are something "external" to the application logic, and explicit parameters are ... well ... explicit.
Consider a function def f(x: Double): Double = x*x
It is a pure function that transforms a given real number into another real number. It makes sense for x to be an explicit parameter, as it is an intrinsic part of what this function is.
Now, suppose, you were implementing some sort of approximate algorithm for multiplication, and wanted to control the precision with which you function computes the answer.
You could do def f(x: Double, precision: Int): Double = ???. It would work, but is inconvenient and kinda clumsy:
Function definition no longer expresses the conceptual "nature" of the function being a pure transformation on the set of real numbers
It makes it complicated at the call site, because everyone using your function must now be aware of this additional parameter to pass around (imagine, you are writing a library for non–engineer math majors to use, they understand abstract transformations and complex formulas, but could care less about numeric precision: how often do you think about precision when you need to compute an area of a square?).
It also makes existing code harder to read and modify
So, to make it prettier, you can do def f(x: Double)(implicit precision: Int) = ???. This has an advantage of saying exactly what you want: "I have a transformation double => double, that will use the implied precision when the actual result is computed). Those math majors can now write their abstract formulas the way they are used to: val area = square(x) without polluting their logic with annoying configurations they don't really care about.
When to use this exactly is, certainly, a question of opinion and taste (which is expressly forbidden on SO). Someone can certainly argue about the above example, that precision is actually a part of the transformation definition, because 5.429 and 5.54289 (results of f(2.33)(3) and f(2.33)(4) respectively) are two different numbers.
So, in the end of the day, you just gotta use your judgement and your common sense to make a decision for every case you come across.
When using existing libraries, there is another consideration. Consider:
def foo(f: Future[Int], ec: ExecutionContext) =
f.map { x => x*x }(ec)
.map { _.toString } (ec)
.foreach(println)(ec)
This would look a lot nicer and less messy if you made ec implicit, regardless of where you stand philosophically on whether to consider it a part of your transformation or not:
def foo(f: Future[Int])(implicit ec: ExecutionContext) =
f.map { x => x*x }.map(_.toString).foreach(println)
Implicits can be used when:
you need only one value of some type
it is unambiguous how such value would be defined
this includes both manual definition as well as using metaprogramming to generate the value based on e.g. how its type is defined
Futures and Akka decided that passing some "globals" as implicits is a reasonable use case, so they would pass as implicits:
ExecutionContext
ActorSystem, Materializer
various configs like Timeout
in general things which you don't want to be put into some static field, but which are passed around everywhere.
However, the rest of Scala world would solve this issue by using some abstraction that would pass these things under the hood, some sort of builders, via constructors, abstractions over (dependencies) => result functions, etc.
E.g. cats.effect.IO don't need to pass ExecutionContext around because it passes its scheduler around when you run it. Only when you want to explicitly change the pool things are being run on you have to use some method. In Monix running things also require you to pass Scheduler at the end, when whole computation is composed. So both approached let you give up on passing around all these ExecutionContexts. In case of Future it is necessary because you need to have control over thread pools, but you also evaluate things eagerly, and putting ec (futureA.flatMap(f)(ec)) manually would break for-comprehension.
As a result, outside Akka ecosystem and raw Futures, are more often used to carry around type-classes, as a mean to decouple business logic from particular implementation, allow adding support for new types without modifying code that uses these implementations, and so on. (There are tons of examples of type-classes in Scala so I'll skip it here).
Usually, when I read about people using implicits to pass configs around, it is just a matter of time before it ends up with grief. Akka and EC kind of requires them but you should just pass configs explicitly. You can group them into case classes to pass bunch of them around and it is not that much of an issue. You can also put all things required as implicits explicitly into one place and do:
case class Configs(dbEX: EC, mapEC: EC)
class SomeBehavior(configs: Configs) {
def someAction = {
if (...) {
implicit val ec: EC = configs.dbEC
...
} else {
implicit val ec: EC = configs.mapEC
...
}
}
}
to make them implicit only in the place that needs them. A good role of thumb is: do you care if there is something passed around that you don't see right in the code? Usually, the answer is, yes you do, you would prefer to see it, with only exceptions being cases when it would be somewhat obvious where does the value come from, or if you kinda knew that the value would be ok and you didn't bother where it came from.
There are a multitude of use-cases of implicit in Scala: under the hood, they boil down to leveraging the compiler's implicit resolution mechanism to fill in things that might not have explicitly been mentioned, but the use-cases are divergent enough that in Scala 3, each use-case (of those that survive into Scala 3...) gets encoded with a different keyword.
In the case of the execution context, implicit arguments are being used to mimic dynamic scope in a language which is normally statically scoped. The primary win from doing this is that it allows behavior further down the call stack to be decided-upon much further up the call stack without having to always explicitly pass on the behavior through the intervening layers of the stack (while providing a way for those intervening layers to cleanly force a different behavior).
Historically, a major example of this was for things like numeric precision. Many numeric operations end up being implemented through iterated refinement (e.g. when square-root was implemented in software, it might be implemented using Newton's method), which means there's a trade-off between speed of calculation and precision (suggesting accuracy). With dynamic scoping, there's a neat way to accomplish this: a global variable for the desired level of precision in mathematical results. Your numeric routine checks the value of that variable and governs itself accordingly. The difference from globals in a statically-scoped language is that when A calls B which calls C, if A sets the value of x to 1 and B sets it to 2, x will be 2 when checked in C or B, but once B returns to A, x will once again be 1 (in dynamically scoped languages, you can think of a global variable as really being a name for a stack of values, and the language implementation automatically pops the stack as appropriate).
Dynamic scoping was once fairly popular (especially so in Lisps before the mid/late 1970s); nowadays the only places you really see it are in Bourne shells (including bash), Emacs Lisp; while some languages (Perl and Common Lisp are probably the two main examples) are hybrids: a variable gets declared in a special way to make it dynamically or statically scoped. Static scoping has pretty clearly won: it's easier for the language implementation or the programmer to reason about.
The cost of that ease is that, in our numeric computation example, we end up with something like the following:
def newtonSqrt(x: Double, precision: Int): Double = ???
/** Calculates the length of the hypotenuse of a right triangle with legs of given lengths
*/
def hypotenuse(x: Double, y: Double, precision: Int): Double =
newtonSqrt(x*x + y*y, precision)
Thankfully, Scala supports default arguments, so we avoid having versions that use a default precision, too. Arguably, the precision is exposing an implementation detail (the fact that our calculations aren't necessarily perfectly mathematically accurate): the important thing is that the length of the hypotenuse is the square root of the sum of the squares of the legs.
In Scala, we can make the precision implicit:
// DON'T ACTUALLY PASS AN INT IMPLICITLY!!!!!!
def newtonSqrt(x: Double)(implicit precision: Int): Double = ???
def hypotenuse(x: Double, y: Double)(implicit precision: Int): Double =
newtonSqrt(x*x, y*y)
(It's actually really bad to ever pass a primitive or any type which could plausibly be used for something other than describing the behavior in question through the implicit mechanism: I'm doing it here for didactic clarity).
The compiler will effectively translate newtonSqrt(x*x + y*y) to (something very similar to) newtonSqrt(x*x + y*y, precision). Now callers to hypotenuse can decide to fix precision via an implicit val or to defer the choice to their callers by adding the implicit to their signature.
Dynamic scoping has long been controversial, so it's no surprise that even the constrained dynamic scoping this usage of implicit embeds is controversial. In Scala's case, it doesn't help that in many cases the tooling throws up its hands when it comes to helping you figure out implicits: most of the really furious compiler errors one encounters are related to missing implicits or collisions, and tracing to figure out which values are in the implicit scope at any time is not something the tooling has a history of helping people with. Thus there are many developers who have decided that explicitly threading through configuration is superior to using implicits.
It's largely a matter of taste and the situation whether this sort of behavior description is best passed implicitly or explicitly (and it's worth noting that the type-class pattern, especially without a hard requirement for coherence (that there be one and only one possible way to describe the behavior) as is typical in Scala, is just a special case of this behavior description).
I should also note that it isn't a binary choice between bundling a few settings into a case class vs. passing them implicitly: you can do both:
case class ProcessSettings(sys: ActorSystem, ec: ExecutionContext)
object ProcessSettings {
implicit def implicitly(implicit sys: ActorSystem, ec: ExecutionContext): ProcessSettings =
ProcessSettings(sys, ec)
}
def doStuff(x: SomeInput)(implicit settings: ProcessSettings)

Understanding the continuation theorem in Scala

So, I was trying to learn about Continuation. I came across with the following saying (link):
Say you're in the kitchen in front of the refrigerator, thinking about a sandwich. You take a continuation right there and stick it in your pocket. Then you get some turkey and bread out of the refrigerator and make yourself a sandwich, which is now sitting on the counter. You invoke the continuation in your pocket, and you find yourself standing in front of the refrigerator again, thinking about a sandwich. But fortunately, there's a sandwich on the counter, and all the materials used to make it are gone. So you eat it. :-) — Luke Palmer
Also, I saw a program in Scala:
var k1 : (Unit => Sandwich) = null
reset {
shift { k : Unit => Sandwich) => k1 = k }
makeSandwich
}
val x = k1()
I don't really know the syntax of Scala (looks similar to Java and C mixed together) but I would like to understand the concept of Continuation.
Firstly, I tried to run this program (by adding it into main). But it fails, I think that it has a syntax error due to the ) near Sandwich but I'm not sure. I removed it but it still does not compile.
How to create a fully compiled example that shows the concept of the story above?
How this example shows the concept of Continuation.
In the link above there was the following saying: "Not a perfect analogy in Scala because makeSandwich is not executed the first time through (unlike in Scheme)". What does it mean?
Since you seem to be more interested in the concept of the "continuation" rather than specific code, let's forget about that code for a moment (especially because it is quite old and I don't really like those examples because IMHO you can't understand them correctly unless you already know what a continuation is).
Note: this is a very long answer with some attempts to describe what a continuations is and why it is useful. There are some examples in Scala-like pseudo-code none of which can actually be compiled and run (there is just one compilable example at the very end and it references another example from the middle of the answer). Expect to spend a significant amount of time just reading this answer.
Intro to continuations
Probably the first thing you should do to understand a continuation is to forget about how modern compilers for most of the imperative languages work and how most of the modern CPUs work and particularly the idea of the call stack. This is actually implementation details (although quite popular and quite useful in practice).
Assume you have a CPU that can execute some sequence of instructions. Now you want to have a high level languages that support the idea of methods that can call each other. The obvious problem you face is that the CPU needs some "forward only" sequence of commands but you want some way to "return" results from a sub-program to the caller. Conceptually it means that you need to have some way to store somewhere before the call all the state of the caller method that is required for it to continue to run after the result of the sub-program is computed, pass it to the sub-program and then ask the sub-program at the end to continue execution from that stored state. This stored state is exactly a continuation. In most of the modern environments those continuations are stored on the call stack and often there are some assembly instructions specifically designed to help handling it (like call and return). But again this is just implementation details. Potentially they might be stored in an arbitrary way and it will still work.
So now let's re-iterate this idea: a continuation is a state of the program at some point that is enough to continue its execution from that point, typically with no additional input or some small known input (like a return value of the called method). Running a continuation is different from a method call in that usually continuation never explicitly returns execution control back to the caller, it can only pass it to another continuation. Potentially you can create such a state yourself, but in practice for the feature to be useful you need some support from the compiler to build continuations automatically or emulate it in some other way (this is why the Scala code you see requires a compiler plugin).
Asynchronous calls
Now there is an obvious question: why continuations are useful at all? Actually there are a few more scenarios besides the simple "return" case. One such scenario is asynchronous programming. Actually if you do some asynchronous call and provide a callback to handle the result, this can be seen as passing a continuation. Unfortunately most of the modern languages do not support automatic continuations so you have to grab all the relevant state yourself. Another problem appears if you have some logic that needs a sequence of many async calls. And if some of the calls are conditional, you easily get to the callbacks hell. The way continuations help you avoid it is by allowing you build a method with effectively inverted control flow. With typical call it is the caller that knows the callee and expects to get a result back in a synchronous way. With continuations you can write a method with several "entry points" (or "return to points") for different stages of the processing logic that you can just pass to some other method and that method can still return to exactly that position.
Consider following example (in pseudo-code that is Scala-like but is actually far from the real Scala in many details):
def someBusinessLogic() = {
val userInput = getIntFromUser()
val firstServiceRes = requestService1(userInput)
val secondServiceRes = if (firstServiceRes % 2 == 0) requestService2v1(userInput) else requestService2v2(userInput)
showToUser(combineUserInputAndResults(userInput,secondServiceRes))
}
If all those calls a synchronous blocking calls, this code is easy. But assume all those get and request calls are asynchronous. How to re-write the code? The moment you put the logic in callbacks you loose the clarity of the sequential code. And here is where continuations might help you:
def someBusinessLogicCont() = {
// the method entry point
val userInput
getIntFromUserAsync(cont1, captureContinuationExpecting(entry1, userInput))
// entry/return point after user input
entry1:
val firstServiceRes
requestService1Async(userInput, captureContinuationExpecting(entry2, firstServiceRes))
// entry/return point after the first request to the service
entry2:
val secondServiceRes
if (firstServiceRes % 2 == 0) {
requestService2v1Async(userInput, captureContinuationExpecting(entry3, secondServiceRes))
// entry/return point after the second request to the service v1
entry3:
} else {
requestService2v2Async(userInput, captureContinuationExpecting(entry4, secondServiceRes))
// entry/return point after the second request to the service v2
entry4:
}
showToUser(combineUserInputAndResults(userInput, secondServiceRes))
}
It is hard to capture the idea in a pseudo-code. What I mean is that all those Async method never return. The only way to continue execution of the someBusinessLogicCont is to call the continuation passed into the "async" method. The captureContinuationExpecting(label, variable) call is supposed to create a continuation of the current method at the label with the input (return) value bound to the variable. With such a re-write you still has a sequential-looking business logic even with all those asynchronous calls. So now for a getIntFromUserAsync the second argument looks like just another asynchronous (i.e. never-returning) method that just requires one integer argument. Let's call this type Continuation[T]
trait Continuation[T] {
def continue(value: T):Nothing
}
Logically Continuation[T] looks like a function T => Unit or rather T => Nothing where Nothing as the return type signifies that the call actually never returns (note, in actual Scala implementation such calls do return, so no Nothing there, but I think conceptually it is easy to think about no-return continuations).
Internal vs external iteration
Another example is a problem of iteration. Iteration can be internal or external. Internal iteration API looks like this:
trait CollectionI[T] {
def forEachInternal(handler: T => Unit): Unit
}
External iteration looks like this:
trait Iterator[T] {
def nextValue(): Option[T]
}
trait CollectionE[T] {
def forEachExternal(): Iterator[T]
}
Note: often Iterator has two method like hasNext and nextValue returning T but it will just make the story a bit more complicated. Here I use a merged nextValue returning Option[T] where the value None means the end of the iteration and Some(value) means the next value.
Assuming the Collection is implemented by something more complicated than an array or a simple list, for example some kind of a tree, there is a conflict here between the implementer of the API and the API user if you use typical imperative language. And the conflict is over the simple question: who controls the stack (i.e. the easy to use state of the program)? The internal iteration is easier for the implementer because he controls the stack and can easily store whatever state is needed to move to the next item but for the API user the things become tricky if she wants to do some aggregation of the stored data because now she has to save the state between the calls to the handler somewhere. Also you need some additional tricks to let the user stop the iteration at some arbitrary place before the end of the data (consider you are trying to implement find via forEach). Conversely the external iteration is easy for the user: she can store all the state necessary to process data in any way in local variables but the API implementer now has to store his state between calls to the nextValue somewhere else. So fundamentally the problem arises because there is only one place to easily store the state of "your" part of the program (the call stack) and two conflicting users for that place. It would be nice if you could just have two different independent places for the state: one for the implementer and another for the user. And continuations provide exactly that. The idea is that we can pass execution control between two methods back and forth using two continuations (one for each part of the program). Let's change the signatures to:
// internal iteration
// continuation of the iterator
type ContIterI[T] = Continuation[(ContCallerI[T], ContCallerLastI)]
// continuation of the caller
type ContCallerI[T] = Continuation[(T, ContIterI[T])]
// last continuation of the caller
type ContCallerLastI = Continuation[Unit]
// external iteration
// continuation of the iterator
type ContIterE[T] = Continuation[ContCallerE[T]]
// continuation of the caller
type ContCallerE[T] = Continuation[(Option[T], ContIterE[T])]
trait Iterator[T] {
def nextValue(cont : ContCallerE[T]): Nothing
}
trait CollectionE[T] {
def forEachExternal(): Iterator[T]
}
trait CollectionI[T] {
def forEachInternal(cont : ContCallerI[T]): Nothing
}
Here ContCallerI[T] type, for example, means that this is a continuation (i.e. a state of the program) the expects two input parameters to continue running: one of type T (the next element) and another of type ContIterI[T] (the continuation to switch back). Now you can see that the new forEachInternal and the new forEachExternal+Iterator have almost the same signatures. The only difference in how the end of the iteration is signaled: in one case it is done by returning None and in other by passing and calling another continuation (ContCallerLastI).
Here is a naive pseudo-code implementation of a sum of elements in an array of Int using these signatures (an array is used instead of something more complicated to simplify the example):
class ArrayCollection[T](val data:T[]) : CollectionI[T] {
def forEachInternal(cont0 : ContCallerI[T], lastCont: ContCallerLastI): Nothing = {
var contCaller = cont0
for(i <- 0 to data.length) {
val contIter = captureContinuationExpecting(label, contCaller)
contCaller.continue(data(i), contIter)
label:
}
}
}
def sum(arr: ArrayCollection[Int]): Int = {
var sum = 0
val elem:Int
val iterCont:ContIterI[Int]
val contAdd0 = captureContinuationExpecting(labelAdd, elem, iterCont)
val contLast = captureContinuation(labelReturn)
arr.forEachInternal(contAdd0, contLast)
labelAdd:
sum += elem
val contAdd = captureContinuationExpecting(labelAdd, elem, iterCont)
iterCont.continue(contAdd)
// note that the code never execute this line, the only way to jump out of labelAdd is to call contLast
labelReturn:
return sum
}
Note how both implementations of the forEachInternal and of the sum methods look fairly sequential.
Multi-tasking
Cooperative multitasking also known as coroutines is actually very similar to the iterations example. Cooperative multitasking is an idea that the program can voluntarily give up ("yield") its execution control either to the global scheduler or to another known coroutine. Actually the last (re-written) example of sum can be seen as two coroutines working together: one doing iteration and another doing summation. But more generally your code might yield its execution to some scheduler that then will select which other coroutine to run next. And what the scheduler does is manages a bunch of continuations deciding which to continue next.
Preemptive multitasking can be seen as a similar thing but the scheduler is run by some hardware interruption and then the scheduler needs a way to create a continuation of the program being executed just before the interruption from the outside of that program rather than from the inside.
Scala examples
What you see is a really old article that is referring to Scala 2.8 (while current versions are 2.11, 2.12, and soon 2.13). As #igorpcholkin correctly pointed out, you need to use a Scala continuations compiler plugin and library. The sbt compiler plugin page has an example how to enable exactly that plugin (for Scala 2.12 and #igorpcholkin's answer has the magic strings for Scala 2.11):
val continuationsVersion = "1.0.3"
autoCompilerPlugins := true
addCompilerPlugin("org.scala-lang.plugins" % "scala-continuations-plugin_2.12.2" % continuationsVersion)
libraryDependencies += "org.scala-lang.plugins" %% "scala-continuations-library" % continuationsVersion
scalacOptions += "-P:continuations:enable"
The problem is that plugin is semi-abandoned and is not widely used in practice. Also the syntax has changed since the Scala 2.8 times so it is hard to get those examples running even if you fix the obvious syntax bugs like missing ( here and there. The reason of that state is stated on the GitHub as:
You may also be interested in https://github.com/scala/async, which covers the most common use case for the continuations plugin.
What that plugin does is emulates continuations using code-rewriting (I suppose it is really hard to implement true continuations over the JVM execution model). And under such re-writings a natural thing to represent a continuation is some function (typically called k and k1 in those examples).
So now if you managed to read the wall of text above, you can probably interpret the sandwich example correctly. AFAIU that example is an example of using continuation as means to emulate "return". If we re-sate it with more details, it could go like this:
You (your brain) are inside some function that at some points decides that it wants a sandwich. Luckily you have a sub-routine that knows how to make a sandwich. You store your current brain state as a continuation into the pocket and call the sub-routine saying to it that when the job is done, it should continue the continuation from the pocket. Then you make a sandwich according to that sub-routine messing up with your previous brain state. At the end of the sub-routine it runs the continuation from the pocket and you return to the state just before the call of the sub-routine, forget all your state during that sub-routine (i.e. how you made the sandwich) but you can see the changes in the outside world i.e. that the sandwich is made now.
To provide at least one compilable example with the current version of the scala-continuations, here is a simplified version of my asynchronous example:
case class RemoteService(private val readData: Array[Int]) {
private var readPos = -1
def asyncRead(callback: Int => Unit): Unit = {
readPos += 1
callback(readData(readPos))
}
}
def readAsyncUsage(rs1: RemoteService, rs2: RemoteService): Unit = {
import scala.util.continuations._
reset {
val read1 = shift(rs1.asyncRead)
val read2 = if (read1 % 2 == 0) shift(rs1.asyncRead) else shift(rs2.asyncRead)
println(s"read1 = $read1, read2 = $read2")
}
}
def readExample(): Unit = {
// this prints 1-42
readAsyncUsage(RemoteService(Array(1, 2)), RemoteService(Array(42)))
// this prints 2-1
readAsyncUsage(RemoteService(Array(2, 1)), RemoteService(Array(42)))
}
Here remote calls are emulated (mocked) with a fixed data provided in arrays. Note how readAsyncUsage looks like a totally sequential code despite the non-trivial logic of which remote service to call in the second read depending on the result of the first read.
For full example you need prepare Scala compiler to use continuations and also use a special compiler plugin and library.
The simplest way is a create a new sbt.project in IntellijIDEA with the following files: build.sbt - in the root of the project, CTest.scala - inside main/src.
Here is contents of both files:
build.sbt:
name := "ContinuationSandwich"
version := "0.1"
scalaVersion := "2.11.6"
autoCompilerPlugins := true
addCompilerPlugin(
"org.scala-lang.plugins" % "scala-continuations-plugin_2.11.6" % "1.0.2")
libraryDependencies +=
"org.scala-lang.plugins" %% "scala-continuations-library" % "1.0.2"
scalacOptions += "-P:continuations:enable"
CTest.scala:
import scala.util.continuations._
object CTest extends App {
case class Sandwich()
def makeSandwich = {
println("Making sandwich")
new Sandwich
}
var k1 : (Unit => Sandwich) = null
reset {
shift { k : (Unit => Sandwich) => k1 = k }
makeSandwich
}
val x = k1()
}
What the code above essentially does is calling makeSandwich function (in a convoluted manner). So execution result would be just printing "Making sandwich" into console. The same result would be achieved without continuations:
object CTest extends App {
case class Sandwich()
def makeSandwich = {
println("Making sandwich")
new Sandwich
}
val x = makeSandwich
}
So what's the point? My understanding is that we want to "prepare a sandwich", ignoring the fact that we may be not ready for that. We mark a point of time where we want to return to after all necessary conditions are met (i.e. we have all necessary ingredients ready). After we fetch all ingredients we can return to the mark and "prepare a sandwich", "forgetting that we were unable to do that in past". Continuations allow us to "mark point of time in past" and return to that point.
Now step by step. k1 is a variable to hold a pointer to a function which should allow to "create sandwich". We know it because k1 is declared so: (Unit => Sandwich).
However initially the variable is not initialized (k1 = null, "there are no ingredients to make a sandwich yet"). So we can't call the function preparing sandwich using that variable yet.
So we mark a point of execution where we want to return to (or point of time in past we want to return to) using "reset" statement.
makeSandwich is another pointer to a function which actually allows to make a sandwich. It's the last statement of "reset block" and hence it is passed to "shift" (function) as argument (shift { k : (Unit => Sandwich).... Inside shift we assign that argument to k1 variable k1 = k thus making k1 ready to be called as a function. After that we return to execution point marked by reset. The next statement is execution of function pointed to by k1 variable which is now properly initialized so finally we call makeSandwich which prints "Making sandwich" to a console. It also returns an instance of sandwich class which is assigned to x variable.
Not sure, probably it means that makeSandwich is not called inside reset block but just afterwards when we call it as k1().

Avoid testing duplicate values with ScalaTest forAll

I'm playing with property-based testing on ScalaTest and I had the following code:
val myStrings = Gen.oneOf("hi", "hello")
forAll(myStrings) { s: String =>
println(s"String tested: $s")
}
When I run the forAll code, I've noticed that the same value is tried more than once, e.g.
String tested: hi
String tested: hello
String tested: hi
String tested: hello
String tested: hi
String tested: hello
...
I was wondering if there is a way for, given the code above, for each value in oneOf to be tried only once. In other words, to get ScalaTest not to use the same value twice.
Even if I used other generators, such as Gen.alphaStr, I'd like to find a way to avoid testing the same String twice. The reason I'm interested in doing this is because each test runs against a server running in a different process, and hence there's a bit of cost involved, so I'd like to avoid testing the same thing twice.
What you're trying to do is seems to be against scalacheck ideology(see Note1); however it's kind of possible (with high probability) by reducing the number of samples:
scala> forAll(oneOf("a", "b")){i => println(i); true}.check(Test.Parameters.default.withMinSuccessfulTests(2))
a
b
+ OK, passed 2 tests.
Note that you can still get aa/bb sometimes, as scala-check is built on randomness and statistical approach. If you need to always check all combinations - you probably don't need scala-check:
scala> assert(Set("a", "b").forall(_ => true))
Basically Gen allows you to create an infinite collection that represents a distribution of input values. The more values you generate - the better sampling you get. So if you have N possible states, you can't guarantee that they won't repeat in an infinite collection.
The only way to do exactly what you want is to explicitly check for duplicates before calling the service. You can use something like Option(ConcurrentHashMap.putIfAbscent(value, value)).isEmpty for that. Keep in mind it is a risk of OOM so be careful to take care of the amount of generated values and maybe even add an explicit check.
Note1) What scalacheck is needed for reducing number of combinations from maximum (which is more than 100) to some value that still gives you a good check. So scalacheck is useful when a set of possible inputs is really huge. And in that case the probability of repetitions is really small
P.S.
Talking about oneOf (from scaladoc):
def oneOf[T](t0: T, t1: T, tn: T*): Gen[T]
Picks a random value from a list
See also (examples are a bit outdated): How can I reduce the number of test cases ScalaCheck generates?
I would aim to increase the entropy of values. Using random sentences will increase it a lot, although not (theoretically) fixing the issue.
val genWord = Gen.onOf("hi", "hello")
def sentanceOf(words: Int): Gen[String] = {
Gen.listOfN(words, genWord).map(_.mkString(" ")
}

Why not mark val or var variables as private?

Coming from a java background I always mark instance variables as private. I'm learning scala and almost all of the code I have viewed the val/var instances have default (public) access. Why is this the access ? Does it not break information hiding/encapsulation principle ?
It would help it you specified which code, but keep in mind that some example code is in a simplified form to highlight whatever it is that the example is supposed to show you. Since the default access is public, that means that you often get the modifiers left off for simplicity.
That said, since a val is immutable, there's not much harm in leaving it public as long as you recognize that this is now part of the API for your class. That can be perfectly okay:
class DataThingy(data: Array[Double) {
val sum = data.sum
}
Or it can be an implementation detail that you shouldn't expose:
class Statistics(data: Array[Double]) {
val sum = data.sum
val sumOfSquares = data.map(x => x*x).sum
val expectationSquared = (sum * sum)/(data.length*data.length)
val expectationOfSquare = sumOfSquares/data.length
val varianceOfSample = expectationOfSquare - expectationSquared
val standardDeviation = math.sqrt(data.length*varianceOfSample/(data.length-1))
}
Here, we've littered our class with all of the intermediate steps for calculating standard deviation. And this is especially foolish given that this is not the most numerically stable way to calculate standard deviation with floating point numbers.
Rather than merely making all of these private, it is better style, if possible, to use local blocks or private[this] defs to perform the intermediate computations:
val sum = data.sum
val standardDeviation = {
val sumOfSquares = ...
...
math.sqrt(...)
}
or
val sum = data.sum
private[this] def findSdFromSquares(s: Double, ssq: Double) = { ... }
val standardDeviation = findMySD(sum, data.map(x => x*x).sum)
If you need to store a calculation for later use, then private val or private[this] val is the way to go, but if it's just an intermediate step on the computation, the options above are better.
Likewise, there's no harm in exposing a var if it is a part of the interface--a vector coordinate on a mutable vector for instance. But you should make them private (better yet: private[this], if you can!) when it's an implementation detail.
One important difference between Java and Scala here is that in Java you can not replace a public variable with getter and setter methods (or vice versa) without breaking source and binary compatibility. In Scala you can.
So in Java if you have a public variable, the fact that it's a variable will be exposed to the user and if you ever change it, the user has to change his code. In Scala you can replace a public var with a getter and setter method (or a public val with just a getter method) without the user ever knowing the difference. So in that sense no implementation details are exposed.
As an example, let's consider a rectangle class:
class Rectangle(val width: Int, val height:Int) {
val area = width * height
}
Now what happens if we later decide that we don't want the area to be stored as a variable, but rather it should be calculated each time it's called?
In Java the situation would be like this: If we had used a getter method and a private variable, we could just remove the variable and change the getter method to calculate the area instead of using the variable. No changes to user code needed. But since we've used a public variable, we are now forced to break user code :-(
In Scala it's different: we can just change the val to def and that's it. No changes to user code needed.
Actually, some Scala developers tend to use default access too much. But you can find appropriate examples in famous Scala projects(for example, Twitter's Finagle).
On the other hand, creating objects as immutable values is the standard way in Scala. We don't need to hide all the attributes if they're immutable completely.
I'd like to answer the question with a bit more generic approach. I think the answer you are looking for has to do with the design paradigms on which Scala is built. Instead of the classical prodecural / object oriented approach, like you see in Java, functional programming is used to a much higher extend. I cannot cover all the code that you mention of course, but in general (well written) Scala code will not need a lot of mutability.
As pointed out by Rex, val's are immutable, so there are few reasons for them to not be public. But as I see it the immutability is not a goal in itself, but a result of functional programming. So if we consider functions as something like x -> function -> y the function part becomes somewhat of a black box; we don't really care what it does, as long as it does it correctly. As the Haskell Wiki writes:
Purely functional programs typically operate on immutable data. Instead of altering existing values, altered copies are created and the original is preserved.
This also explains the missing closure, since the parts we traditionally wanted to hide away is executed in the functions and thus hidden anyway.
So, to cut things short, I would argue that mutability and closure has become more redundant in Scala. And why clutter things up with getters and setter when it can be avoided?

Configuration data in Scala -- should I use the Reader monad?

How do I create a properly functional configurable object in Scala? I have watched Tony Morris' video on the Reader monad and I'm still unable to connect the dots.
I have a hard-coded list of Client objects:
class Client(name : String, age : Int){ /* etc */}
object Client{
//Horrible!
val clients = List(Client("Bob", 20), Client("Cindy", 30))
}
I want Client.clients to be determined at runtime, with the flexibility of either reading it from a properties file or from a database. In the Java world I'd define an interface, implement the two types of source, and use DI to assign a class variable:
trait ConfigSource {
def clients : List[Client]
}
object ConfigFileSource extends ConfigSource {
override def clients = buildClientsFromProperties(Properties("clients.properties"))
//...etc, read properties files
}
object DatabaseSource extends ConfigSource { /* etc */ }
object Client {
#Resource("configuration_source")
private var config : ConfigSource = _ //Inject it at runtime
val clients = config.clients
}
This seems like a pretty clean solution to me (not a lot of code, clear intent), but that var does jump out (OTOH, it doesn't seem to me really troublesome, since I know it will be injected once-and-only-once).
What would the Reader monad look like in this situation and, explain it to me like I'm 5, what are its advantages?
Let's start with a simple, superficial difference between your approach and the Reader approach, which is that you no longer need to hang onto config anywhere at all. Let's say you define the following vaguely clever type synonym:
type Configured[A] = ConfigSource => A
Now, if I ever need a ConfigSource for some function, say a function that gets the n'th client in the list, I can declare that function as "configured":
def nthClient(n: Int): Configured[Client] = {
config => config.clients(n)
}
So we're essentially pulling a config out of thin air, any time we need one! Smells like dependency injection, right? Now let's say we want the ages of the first, second and third clients in the list (assuming they exist):
def ages: Configured[(Int, Int, Int)] =
for {
a0 <- nthClient(0)
a1 <- nthClient(1)
a2 <- nthClient(2)
} yield (a0.age, a1.age, a2.age)
For this, of course, you need some appropriate definition of map and flatMap. I won't get into that here, but will simply say that Scalaz (or Rúnar's awesome NEScala talk, or Tony's which you've seen already) gives you all you need.
The important point here is that the ConfigSource dependency and its so-called injection are mostly hidden. The only "hint" that we can see here is that ages is of type Configured[(Int, Int, Int)] rather than simply (Int, Int, Int). We didn't need to explicitly reference config anywhere.
As an aside, this is the way I almost always like to think about monads: they hide their effect so it's not polluting the flow of your code, while explicitly declaring the effect in the type signature. In other words, you needn't repeat yourself too much: you say "hey, this function deals with effect X" in the function's return type, and don't mess with it any further.
In this example, of course the effect is to read from some fixed environment. Another monadic effect you might be familiar with include error-handling: we can say that Option hides error-handling logic while making the possibility of errors explicit in your method's type. Or, sort of the opposite of reading, the Writer monad hides the thing we're writing to while making its presence explicit in the type system.
Now finally, just as we normally need to bootstrap a DI framework (somewhere outside our usual flow of control, such as in an XML file), we also need to bootstrap this curious monad. Surely we'll have some logical entry point to our code, such as:
def run: Configured[Unit] = // ...
It ends up being pretty simple: since Configured[A] is just a type synonym for the function ConfigSource => A, we can just apply the function to its "environment":
run(ConfigFileSource)
// or
run(DatabaseSource)
Ta-da! So, contrasting with the traditional Java-style DI approach, we don't have any "magic" occurring here. The only magic, as it were, is encapsulated in the definition of our Configured type and the way it behaves as a monad. Most importantly, the type system keeps us honest about which "realm" dependency injection is occurring in: anything with type Configured[...] is in the DI world, and anything without it is not. We simply don't get this in old-school DI, where everything is potentially managed by the magic, so you don't really know which portions of your code are safe to reuse outside of a DI framework (for example, within your unit tests, or in some other project entirely).
update: I wrote up a blog post which explains Reader in greater detail.