Hashcode doesn't change between reruns - scala

object Main extends App {
var a = new AnyRef()
println(a hashCode)
}
I have this code in Intellij Idea. I noticed that hashcode does not change between reruns. Even more, it doesn't change if I restart idea, or do some light modifications to the code. I can rename variable a or add a few more variables and I still have the same hashcode.
Is it cached somewhere? Or it's just OS who allocated the same address to a variable? Any consequences of this?
I'd expect it to be new each time, as OS should allocate new address each run.

The implementation for Object.hashCode() can vary between JVMs as long as it obeys the contract, which doesn't require the numbers to be different between runs. For HotSpot there is even an option (-XX:hashCode) to change the implementation.
HotSpot's default is to use a random number generator, so if you are using that (with no -XX:hashCode option) then it seems it uses the same seed on each run, resulting in the same sequence of hash codes. There's nothing wrong with that.
lmm's answer is not correct unless maybe if you are using HotSpot with -XX:hashCode=4 or another JVM that uses this technique by default. But I'm not at all certain about that (you can try yourself by using HotSpot with -XX:hashCode=4 and see if you get another value which also stays the same between runs).
Check out the code for the different options:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/runtime/synchronizer.cpp#l555
There is a comment in there about making the "else" branch the default, which is the Xorshift pattern, which is indeed a pseudo-random number generator which will always provide the same sequence.
The answer from "apangin" on this question says that indeed this has become the default since JDK8 which explains the change from JDK7 you described in your comment.
I can confirm that this is correct, look at the JDK8 source:
http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/runtime/globals.hpp#l1127
--> Default value is now 5, which corresponds to the "else" branch (Xorshift).

Some experiment:
scala> class A extends AnyRef
defined class A
scala> val a1= new A
a1: A = A#5f6b1f19
scala> val a2 = new A
a2: A = A#d60aa4
scala> a1.hashCode
res19: Int = 1600855833
scala> a2.hashCode
res20: Int = 14027428
scala> val a3 = new AnyRef
a3: Object = java.lang.Object#16c3388e
scala> a3.hashCode
res21: Int = 381892750
So, it's obvious AnyRef hash code is equal to address of object. If we have equal hashes it's mean object address is the same on every rerun. And that is true for me with two repls.
API tells about AnyRef hashCode method:
The hashCode method for reference types. See hashCode in scala.Any.
And about Any method:
Calculate a hash code value for the object.
The default hashing algorithm is platform dependent.
I guess that platform determines location of object and therefore value of hashCode.

Any new process gets its own virtual address space from the OS. So while the process might exist at a different physical address each time the program runs, it will be mapped to the same virtual address each time. (ASLR exists, but I understand the JVM doesn't participate in it). You can see this with e.g. a small C program with a string constant in (you might have to deliberately disable ASLR for that program) - if you take a pointer to the string constant and print that pointer as an integer, it will be the same value every time.

hashCode() is not a random number. It is a digested result from analyzing some part of an object. Objects with the same values will, more than likely, have the same hash code. This is true for your case, since the "value" of an AnyRef with no fields is essentially empty.

Related

When to use implicit parameters

I've been using Scala at work, and I have a question related to implicit parameters.
Often I've seen executionContext defined in method definitions and also in class definitions.
At the same time I've seen classes that accepts case classes that contain configuration data (timeout, adapter, port, etc.) as regular parameters.
My question is why when passing configuration this parameter is not defined as implicit?
Or the other way around what if executionContext would be defined as a regular parameter?
I'm trying to understand when to use implicit parameter and when not to use them.
EDIT: maybe the example of passing a case class is not the best example, it was the first idea that comes to my mind
Conceptually, implicits are something "external" to the application logic, and explicit parameters are ... well ... explicit.
Consider a function def f(x: Double): Double = x*x
It is a pure function that transforms a given real number into another real number. It makes sense for x to be an explicit parameter, as it is an intrinsic part of what this function is.
Now, suppose, you were implementing some sort of approximate algorithm for multiplication, and wanted to control the precision with which you function computes the answer.
You could do def f(x: Double, precision: Int): Double = ???. It would work, but is inconvenient and kinda clumsy:
Function definition no longer expresses the conceptual "nature" of the function being a pure transformation on the set of real numbers
It makes it complicated at the call site, because everyone using your function must now be aware of this additional parameter to pass around (imagine, you are writing a library for non–engineer math majors to use, they understand abstract transformations and complex formulas, but could care less about numeric precision: how often do you think about precision when you need to compute an area of a square?).
It also makes existing code harder to read and modify
So, to make it prettier, you can do def f(x: Double)(implicit precision: Int) = ???. This has an advantage of saying exactly what you want: "I have a transformation double => double, that will use the implied precision when the actual result is computed). Those math majors can now write their abstract formulas the way they are used to: val area = square(x) without polluting their logic with annoying configurations they don't really care about.
When to use this exactly is, certainly, a question of opinion and taste (which is expressly forbidden on SO). Someone can certainly argue about the above example, that precision is actually a part of the transformation definition, because 5.429 and 5.54289 (results of f(2.33)(3) and f(2.33)(4) respectively) are two different numbers.
So, in the end of the day, you just gotta use your judgement and your common sense to make a decision for every case you come across.
When using existing libraries, there is another consideration. Consider:
def foo(f: Future[Int], ec: ExecutionContext) =
f.map { x => x*x }(ec)
.map { _.toString } (ec)
.foreach(println)(ec)
This would look a lot nicer and less messy if you made ec implicit, regardless of where you stand philosophically on whether to consider it a part of your transformation or not:
def foo(f: Future[Int])(implicit ec: ExecutionContext) =
f.map { x => x*x }.map(_.toString).foreach(println)
Implicits can be used when:
you need only one value of some type
it is unambiguous how such value would be defined
this includes both manual definition as well as using metaprogramming to generate the value based on e.g. how its type is defined
Futures and Akka decided that passing some "globals" as implicits is a reasonable use case, so they would pass as implicits:
ExecutionContext
ActorSystem, Materializer
various configs like Timeout
in general things which you don't want to be put into some static field, but which are passed around everywhere.
However, the rest of Scala world would solve this issue by using some abstraction that would pass these things under the hood, some sort of builders, via constructors, abstractions over (dependencies) => result functions, etc.
E.g. cats.effect.IO don't need to pass ExecutionContext around because it passes its scheduler around when you run it. Only when you want to explicitly change the pool things are being run on you have to use some method. In Monix running things also require you to pass Scheduler at the end, when whole computation is composed. So both approached let you give up on passing around all these ExecutionContexts. In case of Future it is necessary because you need to have control over thread pools, but you also evaluate things eagerly, and putting ec (futureA.flatMap(f)(ec)) manually would break for-comprehension.
As a result, outside Akka ecosystem and raw Futures, are more often used to carry around type-classes, as a mean to decouple business logic from particular implementation, allow adding support for new types without modifying code that uses these implementations, and so on. (There are tons of examples of type-classes in Scala so I'll skip it here).
Usually, when I read about people using implicits to pass configs around, it is just a matter of time before it ends up with grief. Akka and EC kind of requires them but you should just pass configs explicitly. You can group them into case classes to pass bunch of them around and it is not that much of an issue. You can also put all things required as implicits explicitly into one place and do:
case class Configs(dbEX: EC, mapEC: EC)
class SomeBehavior(configs: Configs) {
def someAction = {
if (...) {
implicit val ec: EC = configs.dbEC
...
} else {
implicit val ec: EC = configs.mapEC
...
}
}
}
to make them implicit only in the place that needs them. A good role of thumb is: do you care if there is something passed around that you don't see right in the code? Usually, the answer is, yes you do, you would prefer to see it, with only exceptions being cases when it would be somewhat obvious where does the value come from, or if you kinda knew that the value would be ok and you didn't bother where it came from.
There are a multitude of use-cases of implicit in Scala: under the hood, they boil down to leveraging the compiler's implicit resolution mechanism to fill in things that might not have explicitly been mentioned, but the use-cases are divergent enough that in Scala 3, each use-case (of those that survive into Scala 3...) gets encoded with a different keyword.
In the case of the execution context, implicit arguments are being used to mimic dynamic scope in a language which is normally statically scoped. The primary win from doing this is that it allows behavior further down the call stack to be decided-upon much further up the call stack without having to always explicitly pass on the behavior through the intervening layers of the stack (while providing a way for those intervening layers to cleanly force a different behavior).
Historically, a major example of this was for things like numeric precision. Many numeric operations end up being implemented through iterated refinement (e.g. when square-root was implemented in software, it might be implemented using Newton's method), which means there's a trade-off between speed of calculation and precision (suggesting accuracy). With dynamic scoping, there's a neat way to accomplish this: a global variable for the desired level of precision in mathematical results. Your numeric routine checks the value of that variable and governs itself accordingly. The difference from globals in a statically-scoped language is that when A calls B which calls C, if A sets the value of x to 1 and B sets it to 2, x will be 2 when checked in C or B, but once B returns to A, x will once again be 1 (in dynamically scoped languages, you can think of a global variable as really being a name for a stack of values, and the language implementation automatically pops the stack as appropriate).
Dynamic scoping was once fairly popular (especially so in Lisps before the mid/late 1970s); nowadays the only places you really see it are in Bourne shells (including bash), Emacs Lisp; while some languages (Perl and Common Lisp are probably the two main examples) are hybrids: a variable gets declared in a special way to make it dynamically or statically scoped. Static scoping has pretty clearly won: it's easier for the language implementation or the programmer to reason about.
The cost of that ease is that, in our numeric computation example, we end up with something like the following:
def newtonSqrt(x: Double, precision: Int): Double = ???
/** Calculates the length of the hypotenuse of a right triangle with legs of given lengths
*/
def hypotenuse(x: Double, y: Double, precision: Int): Double =
newtonSqrt(x*x + y*y, precision)
Thankfully, Scala supports default arguments, so we avoid having versions that use a default precision, too. Arguably, the precision is exposing an implementation detail (the fact that our calculations aren't necessarily perfectly mathematically accurate): the important thing is that the length of the hypotenuse is the square root of the sum of the squares of the legs.
In Scala, we can make the precision implicit:
// DON'T ACTUALLY PASS AN INT IMPLICITLY!!!!!!
def newtonSqrt(x: Double)(implicit precision: Int): Double = ???
def hypotenuse(x: Double, y: Double)(implicit precision: Int): Double =
newtonSqrt(x*x, y*y)
(It's actually really bad to ever pass a primitive or any type which could plausibly be used for something other than describing the behavior in question through the implicit mechanism: I'm doing it here for didactic clarity).
The compiler will effectively translate newtonSqrt(x*x + y*y) to (something very similar to) newtonSqrt(x*x + y*y, precision). Now callers to hypotenuse can decide to fix precision via an implicit val or to defer the choice to their callers by adding the implicit to their signature.
Dynamic scoping has long been controversial, so it's no surprise that even the constrained dynamic scoping this usage of implicit embeds is controversial. In Scala's case, it doesn't help that in many cases the tooling throws up its hands when it comes to helping you figure out implicits: most of the really furious compiler errors one encounters are related to missing implicits or collisions, and tracing to figure out which values are in the implicit scope at any time is not something the tooling has a history of helping people with. Thus there are many developers who have decided that explicitly threading through configuration is superior to using implicits.
It's largely a matter of taste and the situation whether this sort of behavior description is best passed implicitly or explicitly (and it's worth noting that the type-class pattern, especially without a hard requirement for coherence (that there be one and only one possible way to describe the behavior) as is typical in Scala, is just a special case of this behavior description).
I should also note that it isn't a binary choice between bundling a few settings into a case class vs. passing them implicitly: you can do both:
case class ProcessSettings(sys: ActorSystem, ec: ExecutionContext)
object ProcessSettings {
implicit def implicitly(implicit sys: ActorSystem, ec: ExecutionContext): ProcessSettings =
ProcessSettings(sys, ec)
}
def doStuff(x: SomeInput)(implicit settings: ProcessSettings)

Scala (method / function).hashCode static value?

I' was curious if the hashCode() function returns always the same value for a lambda or something in Scala?
My tests have shown to me some static value that does not change even over builds. Is this intended behavior or may it change in future?
If this was some static behavior it would help me a lot building my library.
EDIT:
Let's take this source code:
object Main {
def main(args: Array[String]): Unit = {
val x = (s: String) => 1
val y = (s: String) => 2
println(x.hashCode())
println(y.hashCode())
}
}
It's output on console is for me always 1792393294 and 226170135.
What I'm currently doing is implementing a parser combinator library which I implemented in several languages. And I need to know when wrapper classes are the same (e.g. the underlying functions are the same) so I can implement something like a call stack, which I need to parse as far as possible on failure but prevent endless recursion on error.
Thanks in advance!
The default implementation for hashCode (at least in the Oracle JVM) is in terms of the (initial) memory address of that particular object. So if your program has constructed exactly the same sized objects in exactly the same order before constructing that object, it will in practice return the same value every time.
But this is not at all reliable; most programs do not do exactly the same things every time they run. As soon as you're doing something like responding to user input, that perfect reproducibility will disappear - e.g. maybe you sometimes add enough entries to a HashMap to trigger an enlargement of the table, and sometimes not. And if you construct the same value later in the program, it will of course have a different address; try doing
val z = (s: String) => 1
and observe that it will have a different hashCode from x. Not to mention that the numbers may well be different across different JVMs, different versions of the same JVM, or even when the same JVM is launched with a different -Xms setting.
Computers are often a lot more deterministic in practice than in theory. But this is not the kind of thing that's specified to happen, and certainly not something to rely on in your programs.

Scala lazy collection growth

This question is a bit more theoretical.
I have an object which holds a private mutable list or map, whose growth is append only. I believe I could argue that the object itself is functional, being functionally transparent and to all appearances, immutable. So for example I have
import scala.collection.mutable.Map
case class Foo(bar:String)
object FooContainer {
private val foos = Map.empty[String, Foo]
def getFoo(fooName:String) = foos.getOrElseUpdate(fooName, Foo(fooName))
}
I could do a similar case with lists. Now I imagine to be truly functional, I would need to ensure thread safety, which I suppose I could do simply using locks, synchronise or atomic references. My questions being is this all necessary & sufficient, is it a common practice, are there established patterns for this behaviour?
I would say that "functional" is really the wrong term. The exact example shown above could be called "pure" in the sense that its output is only ever defined by its input, thus not having any (visible) side-effects. In reality it is impure though, since it has hidden internal state.
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices. Wikipedia
The apparent impurity vanishes though, if you make the Foo class mutable:
case class Foo(bar:String) {
private var mutable:Int = 1
def setFoo(x:Int) = mutable = x
def foo = mutable
}
The mutability of the Foo class results in getFoo being impure:
scala> FooContainer.getFoo("test").foo
res5: Int = 1
scala> FooContainer.getFoo("bla").setFoo(5)
scala> FooContainer.getFoo("bla").foo
res7: Int = 5
In order for the function to be apparently pure, FooContainer.getFoo("bla").foo must always return the same value. Therefore, the "purity" of the described construct is fragile at best.
As a general rule, you're better off with a var holding an immutable collection type which is replaced when updated than you are with a val holding a mutable type. The former never deals in values that may change after being created (other than the class which holds the evolving value itself). Then you only need to be moderately careful about when updated values held by the var are published to it, but since updating a reference is generally atomic (apart from things like storing a long or a double to RAM on a 32-bit machine), there's little risk.

Why not mark val or var variables as private?

Coming from a java background I always mark instance variables as private. I'm learning scala and almost all of the code I have viewed the val/var instances have default (public) access. Why is this the access ? Does it not break information hiding/encapsulation principle ?
It would help it you specified which code, but keep in mind that some example code is in a simplified form to highlight whatever it is that the example is supposed to show you. Since the default access is public, that means that you often get the modifiers left off for simplicity.
That said, since a val is immutable, there's not much harm in leaving it public as long as you recognize that this is now part of the API for your class. That can be perfectly okay:
class DataThingy(data: Array[Double) {
val sum = data.sum
}
Or it can be an implementation detail that you shouldn't expose:
class Statistics(data: Array[Double]) {
val sum = data.sum
val sumOfSquares = data.map(x => x*x).sum
val expectationSquared = (sum * sum)/(data.length*data.length)
val expectationOfSquare = sumOfSquares/data.length
val varianceOfSample = expectationOfSquare - expectationSquared
val standardDeviation = math.sqrt(data.length*varianceOfSample/(data.length-1))
}
Here, we've littered our class with all of the intermediate steps for calculating standard deviation. And this is especially foolish given that this is not the most numerically stable way to calculate standard deviation with floating point numbers.
Rather than merely making all of these private, it is better style, if possible, to use local blocks or private[this] defs to perform the intermediate computations:
val sum = data.sum
val standardDeviation = {
val sumOfSquares = ...
...
math.sqrt(...)
}
or
val sum = data.sum
private[this] def findSdFromSquares(s: Double, ssq: Double) = { ... }
val standardDeviation = findMySD(sum, data.map(x => x*x).sum)
If you need to store a calculation for later use, then private val or private[this] val is the way to go, but if it's just an intermediate step on the computation, the options above are better.
Likewise, there's no harm in exposing a var if it is a part of the interface--a vector coordinate on a mutable vector for instance. But you should make them private (better yet: private[this], if you can!) when it's an implementation detail.
One important difference between Java and Scala here is that in Java you can not replace a public variable with getter and setter methods (or vice versa) without breaking source and binary compatibility. In Scala you can.
So in Java if you have a public variable, the fact that it's a variable will be exposed to the user and if you ever change it, the user has to change his code. In Scala you can replace a public var with a getter and setter method (or a public val with just a getter method) without the user ever knowing the difference. So in that sense no implementation details are exposed.
As an example, let's consider a rectangle class:
class Rectangle(val width: Int, val height:Int) {
val area = width * height
}
Now what happens if we later decide that we don't want the area to be stored as a variable, but rather it should be calculated each time it's called?
In Java the situation would be like this: If we had used a getter method and a private variable, we could just remove the variable and change the getter method to calculate the area instead of using the variable. No changes to user code needed. But since we've used a public variable, we are now forced to break user code :-(
In Scala it's different: we can just change the val to def and that's it. No changes to user code needed.
Actually, some Scala developers tend to use default access too much. But you can find appropriate examples in famous Scala projects(for example, Twitter's Finagle).
On the other hand, creating objects as immutable values is the standard way in Scala. We don't need to hide all the attributes if they're immutable completely.
I'd like to answer the question with a bit more generic approach. I think the answer you are looking for has to do with the design paradigms on which Scala is built. Instead of the classical prodecural / object oriented approach, like you see in Java, functional programming is used to a much higher extend. I cannot cover all the code that you mention of course, but in general (well written) Scala code will not need a lot of mutability.
As pointed out by Rex, val's are immutable, so there are few reasons for them to not be public. But as I see it the immutability is not a goal in itself, but a result of functional programming. So if we consider functions as something like x -> function -> y the function part becomes somewhat of a black box; we don't really care what it does, as long as it does it correctly. As the Haskell Wiki writes:
Purely functional programs typically operate on immutable data. Instead of altering existing values, altered copies are created and the original is preserved.
This also explains the missing closure, since the parts we traditionally wanted to hide away is executed in the functions and thus hidden anyway.
So, to cut things short, I would argue that mutability and closure has become more redundant in Scala. And why clutter things up with getters and setter when it can be avoided?

When applying `map` to a `Set` you sometimes want the result not to be a set but overlook this

Or how to avoid accidental removal of duplicates when mapping a Set?
This is a mistake I'm doing very often. Look at the following code:
def countSubelements[A](sl: Set[List[A]]): Int = sl.map(_.size).sum
The function shall count the accumulated size of all the contained lists. The problem is that after mapping the lists to their lengths, the result is still a Set and all lists of size 1 are reduced to a single representative.
Is it just me having this problem? Is there something I can do to prevent this happening? I think I'd love to have two methods mapToSet and mapToSeq for Set. But there is no way to enforce this, and sometimes you don't locally notice that you are working with a Set.
Maybe it's even possible that you were writing code for a Seq and something changes in another class and the underlying object becomes a Set?
Maybe something like a best practise to not let this situation arise at all?
Remote edits break my code
Imagine the following situation:
val totalEdges = graph.nodes.map(_.getEdges).map(_.size).sum / 2
You fetch a collection of Node objects from a graph, use them to get their adjacent edges and sum over them. This works if graph.nodes returns a Seq.
And it breaks if someone decides to make Graph return its nodes as a Set; without this code looking suspicious (at least not to me, do you expect every collection could possibly end up being a Set?) and without touching it.
It seems there will be many possible "gotcha's" if one expects a Seq and gets a Set. It's not a surprise that method implementations can depend on the type of the object and (with overloading) the arguments. With Scala implicits, the method can even depend on the expected return type.
A way to defend against surprises is to explicitly label types. For example, annotating methods with return types, even if it's not required. At least this way, if the type of graph.nodes is changed from Seq to Set, the programmer is aware that there's potential breakage.
For your specific issue, why not define your ownmapToSeq method,
scala> def mapToSeq[A, B](t: Traversable[A])(f: A => B): Seq[B] =
t.map(f)(collection.breakOut)
mapToSeq: [A, B](t: Traversable[A])(f: A => B)Seq[B]
scala> mapToSeq(Set(Seq(1), Seq(1,2)))(_.sum)
res1: Seq[Int] = Vector(1, 3)
scala> mapToSeq(Seq(Seq(1), Seq(1,2)))(_.sum)
res2: Seq[Int] = Vector(1, 3)
The advantage of using breakOut: CanBuildFrom is that the conversion from a Set to a Seq has no additional overhead.
You can make use the pimp my library pattern to make mapToSeq appear to be part of the Traversable trait, inherited by Seq and Set.
Workarounds:
def countSubelements[A](sl: Set[List[A]]): Int = sl.toSeq.map(_.size).sum
def countSubelements[A](sl: Set[List[A]]): Int = sl.foldLeft(0)(_ + _.size)