Scala - initialization order of vals - scala

I have this piece of code that loads Properties from a file:
class Config {
val properties: Properties = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
val forumId = properties.get("forum_id")
}
This seems to be working fine.
I have tried moving the initialization of properties into another val, loadedProperties, like this:
class Config {
val properties: Properties = loadedProps
val forumId = properties.get("forum_id")
private val loadedProps = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
}
But it doesn't work! (properties is null in properties.get("forum_id") ).
Why would that be? Isn't loadedProps evaluated when referenced by properties?
Secondly, is this a good way to initialize variables that require non-trivial processing? In Java, I would declare them final fields, and do the initialization-related operations in the constructor.
Is there a pattern for this scenario in Scala?
Thank you!

Vals are initialized in the order they are declared (well, precisely, non-lazy vals are), so properties is getting initialized before loadedProps. Or in other words, loadedProps is still null when propertiesis getting initialized.
The simplest solution here is to define loadedProps before properties:
class Config {
private val loadedProps = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
val properties: Properties = loadedProps
val forumId = properties.get("forum_id")
}
You could also make loadedProps lazy, meaning that it will be initialized on its first access:
class Config {
val properties: Properties = loadedProps
val forumId = properties.get("forum_id")
private lazy val loadedProps = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
}
Using lazy val has the advantage that your code is more robust to refactoring, as merely changing the declaration order of your vals won't break your code.
Also in this particular occurence, you can just turn loadedProps into a def (as suggested by #NIA) as it is only used once anyway.

I think here loadedProps can be simply turned into a function by simply replacing val with def:
private def loadedProps = {
// Tons of code
}
In this case you are sure that it is called when you call it.
But not sure is it a pattern for this case.

Just an addition with a little more explanation:
Your properties field initializes earlier than loadedProps field here. null is a field's value before initialization - that's why you get it. In def case it's just a method call instead of accessing some field, so everything is fine (as method's code may be called several times - no initialization here). See, http://docs.scala-lang.org/tutorials/FAQ/initialization-order.html. You may use def or lazy val to fix it
Why def is so different? That's because def may be called several times, but val - only once (so its first and only one call is actually initialization of the fileld).
lazy val can initialize only when you call it, so it also would help.
Another, simpler example of what's going on:
scala> class A {val a = b; val b = 5}
<console>:7: warning: Reference to uninitialized value b
class A {val a = b; val b = 5}
^
defined class A
scala> (new A).a
res2: Int = 0 //null
Talking more generally, theoretically scala could analize the dependency graph between fields (which field needs other field) and start initialization from final nodes. But in practice every module is compiled separately and compiler might not even know those dependencies (it might be even Java, which calls Scala, which calls Java), so it's just do sequential initialization.
So, because of that, it couldn't even detect simple loops:
scala> class A {val a: Int = b; val b: Int = a}
<console>:7: warning: Reference to uninitialized value b
class A {val a: Int = b; val b: Int = a}
^
defined class A
scala> (new A).a
res4: Int = 0
scala> class A {lazy val a: Int = b; lazy val b: Int = a}
defined class A
scala> (new A).a
java.lang.StackOverflowError
Actually, such loop (inside one module) can be theoretically detected in separate build, but it won't help much as it's pretty obvious.

Related

Is there anyway, in Scala, to get the Singleton type of something from the more general type?

I have a situation where I'm trying to use implicit resolution on a singleton type. This works perfectly fine if I know that singleton type at compile time:
object Main {
type SS = String with Singleton
trait Entry[S <: SS] {
type out
val value: out
}
implicit val e1 = new Entry["S"] {
type out = Int
val value = 3
}
implicit val e2 = new Entry["T"] {
type out = String
val value = "ABC"
}
def resolve[X <: SS](s: X)(implicit x: Entry[X]): x.value.type = {
x.value
}
def main(args: Array[String]): Unit = {
resolve("S") //implicit found! No problem
}
}
However, if I don't know this type at compile time, then I run into issues.
def main(args: Array[String]): Unit = {
val string = StdIn.readLine()
resolve(string) //Can't find implicit because it doesn't know the singleton type at runtime.
}
Is there anyway I can get around this? Maybe some method that takes a String and returns the singleton type of that string?
def getSingletonType[T <: SS](string: String): T = ???
Then maybe I could do
def main(args: Array[String]): Unit = {
val string = StdIn.readLine()
resolve(getSingletonType(string))
}
Or is this just not possible? Maybe you can only do this sort of thing if you know all of the information at compile-time?
If you knew about all possible implementations of Entry in compile time - which would be possible only if it was sealed - then you could use a macro to create a map/partial function String -> Entry[_].
Since this is open to extending, I'm afraid at best some runtime reflection would have to scan the whole classpath to find all possible implementations.
But even then you would have to embed this String literal somehow into each implementations because JVM bytecode knows nothing about mappings between singleton types and implementations - only Scala compiler does. And then use that to find if among all implementations there is one (and exactly one) that matches your value - in case of implicits if there are two of them at once in the same scope compilation would fail, but you can have more than one implementation as long as the don't appear together in the same scope. Runtime reflection would be global so it wouldn't be able to avoid conflicts.
So no, no good solution for making this compile-time dispatch dynamic. You could create such dispatch yourself by e.g. writing a Map[String, Entry[_]] yourself and using get function to handle missing pices.
Normally implicits are resolved at compile time. But val string = StdIn.readLine() becomes known at runtime only. Principally, you can postpone implicit resolution till runtime but you'll be able to apply the results of such resolution at runtime only, not at compile time (static types etc.)
object Entry {
implicit val e1 = ...
implicit val e2 = ...
}
import scala.reflect.runtime.universe._
import scala.reflect.runtime
import scala.tools.reflect.ToolBox
val toolbox = ToolBox(runtime.currentMirror).mkToolBox()
def resolve(s: String): Any = {
val typ = appliedType(
typeOf[Entry[_]].typeConstructor,
internal.constantType(Constant(s))
)
val instanceTree = toolbox.inferImplicitValue(typ, silent = false)
val instance = toolbox.eval(toolbox.untypecheck(instanceTree)).asInstanceOf[Entry[_]]
instance.value
}
resolve("S") // 3
val string = StdIn.readLine()
resolve(string)
// 3 if you enter S
// ABC if you enter T
// "scala.tools.reflect.ToolBoxError: implicit search has failed" otherwise
Please notice that I put implicits into the companion object of type class in order to make them available in the implicit scope and therefore in the toolbox scope. Otherwise the code should be modified slightly:
object EntryImplicits {
implicit val e1 = ...
implicit val e2 = ...
}
// val instanceTree = toolbox.inferImplicitValue(typ, silent = false)
// should be replaced with
val instanceTree =
q"""
import path.to.EntryImplicits._
implicitly[$typ]
"""
In your code import path.to.EntryImplicits._ is import Main._.
Load Dataset from Dynamically generated Case Class

scala: construction order and early definitions (inheritance)

I am reading "Scala for the Impatient" and in 8.10 there is an example:
class Animal {
val range: Int = 10
val env: Array[Int] = new Array[Int](range)
}
class Ant extends Animal {
override val range: Int = 2
}
The author explains why the env ends up being an empty Array[Int]:
[..]
3. The Animal constructor, in order to initialize the env array, calls the range() getter.
That method is overridden to yield the (as yet uninitialized) range field of the Ant class.
The range method returns 0. (That is the initial value of all integer fields when an object is allocated.)
env is set to an array of length 0.
The Ant constructor continues, setting its range field to 2.[..]
I don't understand the 4th and therefore the next steps are also not clear. range() method is overridden with 2, so why doesn't it set the range already in the 4th step?
Doe is work this way? When the val is overridden, then it gets uninitialised and also all val's that include this overridden gals get modified as well. Is it correct? If yes, why there is a different behaviour with def as outlined here. Why is def defined before the constructor call and val after?
After your comment, I decided to actually look at what exactly is written in the book. After reading the explanation, I decided that I cannot express it any clearer. So instead, I propose to take a look at the completely desugared code, for which there was no place in the short book.
Save this as a Scala script:
class Animal {
val range: Int = 10
val env: Array[Int] = new Array[Int](range)
}
class Ant extends Animal {
override val range: Int = 2
}
val ant = new Ant
println(ant.range)
println(ant.env.size)
and then run it using the -print-option:
> scala -nc -print yourScript.scala
you should see something like this:
class anon$1$Animal extends Object {
private[this] val range: Int = _;
<stable> <accessor> def range(): Int = anon$1$Animal.this.range;
private[this] val env: Array[Int] = _;
<stable> <accessor> def env(): Array[Int] = anon$1$Animal.this.env;
<synthetic> <paramaccessor> <artifact> protected val $outer: <$anon: Object> = _;
<synthetic> <stable> <artifact> def $outer(): <$anon: Object> = anon$1$Animal.this.$outer;
def <init>($outer: <$anon: Object>): <$anon: Object> = {
if ($outer.eq(null))
throw null
else
anon$1$Animal.this.$outer = $outer;
anon$1$Animal.super.<init>();
anon$1$Animal.this.range = 10;
anon$1$Animal.this.env = new Array[Int](anon$1$Animal.this.range());
()
}
};
class anon$1$Ant extends <$anon: Object> {
private[this] val range: Int = _;
override <stable> <accessor> def range(): Int = anon$1$Ant.this.range;
<synthetic> <stable> <artifact> def $outer(): <$anon: Object> = anon$1$Ant.this.$outer;
def <init>($outer: <$anon: Object>): <$anon: anon$1$Animal> = {
anon$1$Ant.super.<init>($outer);
anon$1$Ant.this.range = 2;
()
}
}
This is the desugared code as it is seen by the compiler in later stages of compilation. It's a bit hard to read, but what it is important are these declarations:
// in Animal:
private[this] val range: Int = _;
<stable> <accessor> def range(): Int = anon$1$Animal.this.range;
// in Ant:
private[this] val range: Int = _;
override <stable> <accessor> def range(): Int =
anon$1$Ant.this.range;
and also the statement in the initializer of Animal:
anon$1$Animal.this.env = new Array[Int](anon$1$Animal.this.range())
What you can see here is that there are actually two different variables range: one is Animal.this.range and the other is Ant.this.range. Moreover, there are completely separate defs which are also called range in the desugared code: these are the getters which are generated automatically for vals.
The first variable is indeed initialized in Animal and set to 10:
anon$1$Animal.this.range = 10;
However, this does not matter, because the env is initialized using the getter range(), which is overridden to return Ant.this.range. The variable Ant.this.range is assigned the value 2 once, but after the initializer of Animal has completed. During the initialization of Animal, the variable Ant.this.range holds the default value 0, hence the counter-intuitive result.
If you simplify the desugared code a little bit, you obtain a compilable and readable example that behaves in the same way:
class Animal {
private[this] var _Animal_range: Int = 0
def range: Int = _Animal_range
_Animal_range = 10
val env: Array[Int] = new Array[Int](range)
}
class Ant extends Animal {
private[this] var _Ant_range: Int = 0
override def range: Int = _Ant_range
_Ant_range = 2
}
val ant = new Ant
println(ant.range)
println(ant.env.size)
Here, the same happens:
_Animal_range is allocated with default value 0
_Ant_range is allocated with default value 0
Animal base class begins initialization
_Animal_range is initialized with value 10
To initialize env, the getter range is invoked. It is overridden in the Ant-class, and returns _Ant_range, which is still 0
env is set to an empty array
Animal base class finishes initialization
Ant begins initialization
Only now does it set _Ant_range to 2.
This is why both code snippets print 2 and 0.
Hope that helps.
defs are called when asked for, but vals are kept in memory, so in this case the val version is still set to zero because it hasn’t been initialised, where the def version is called and therefore can never give a misleading value.
It’s only because the val here is overloaded that it isn’t initialised in time, because classes must be initialised starting at the top of the inheritance hierarchy and building down. If env were a def you’d be fine also, because it wouldn’t be created until called, by which time the vals would all be initialised. Of course this way you’d get different lists every time you called env so you might prefer to use lazy val, which is initialised the first time it’s called, but then remains the same and is kept in memory.

Spark serialization error mystery

Let's say i have the following code:
class Context {
def compute() = Array(1.0)
}
val ctx = new Context
val data = ctx.compute
Now we are running this code in Spark:
val rdd = sc.parallelize(List(1,2,3))
rdd.map(_ + data(0)).count()
The code above throws org.apache.spark.SparkException: Task not serializable. I'm not asking how to fix it, by extending Serializable or making a case class, i want to understand why the error happens.
The thing that i don't understand is why it complains about Context class not being a Serializable, though it's not a part of the lambda: rdd.map(_ + data(0)). data here is an Array of values which should be serialized, but it seems that JVM also captures ctx reference as well, which, in my understanding, should not happening.
As i understand, in the shell Spark should clear lambda from the repl context. If we print the tree after delambdafy phase, we would see these pieces:
object iw extends Object {
...
private[this] val ctx: $line11.iw$Context = _;
<stable> <accessor> def ctx(): $line11.iw$Context = iw.this.ctx;
private[this] val data: Array[Double] = _;
<stable> <accessor> def data(): Array[Double] = iw.this.data;
...
}
class anonfun$1 ... {
final def apply(x$1: Int): Double = anonfun$1.this.apply$mcDI$sp(x$1);
<specialized> def apply$mcDI$sp(x$1: Int): Double = x$1.+(iw.this.data().apply(0));
...
}
So the decompiled lambda code that is sent to the worker node is: x$1.+(iw.this.data().apply(0)). Part iw.this belongs to the Spark-Shell session, so, as i understand, it should be cleared by the ClosureCleaner, since has nothing to do with the logic and shouldn't be serialized. Anyway, calling iw.this.data() returns an Array[Double] value of the data variable, which is initialized in the constructor:
def <init>(): type = {
iw.super.<init>();
iw.this.ctx = new $line11.iw$Context();
iw.this.data = iw.this.ctx().compute(); // <== here
iw.this.res4 = ...
()
}
In my understanding ctx value has nothing to do with the lambda, it's not a closure, hence shouldn't be serialized. What am i missing or misunderstanding?
This has to do with what Spark considers it can use as a closure safely. This is in some cases very intuitive, since Spark uses reflection and in many cases can't recognize some of Scala's guarantees (not a full compiler or anything) or the fact that some variables in the same object are irrelevant. For safety, Spark will attempt to serialize any objects referenced, which in your case includes iw, which is not serializable.
The code inside ClosureCleaner has a good example:
For instance, transitive cleaning is necessary in the following
scenario:
class SomethingNotSerializable {
def someValue = 1
def scope(name: String)(body: => Unit) = body
def someMethod(): Unit = scope("one") {
def x = someValue
def y = 2
scope("two") { println(y + 1) }
}
}
In this example, scope "two" is not serializable because it references scope "one", which references SomethingNotSerializable. Note that, however, the body of scope "two" does not actually depend on SomethingNotSerializable. This means we can safely null out the parent pointer of a cloned scope "one" and set it the parent of scope "two", such that scope "two" no longer references SomethingNotSerializable transitively.
Probably the easiest fix is to create a local variable, in the same scope, that extracts the value from your object, such that there is no longer any reference to the encapsulating object inside the lambda:
val rdd = sc.parallelize(List(1,2,3))
val data0 = data(0)
rdd.map(_ + data0).count()

How does serialization of lazy fields work?

I know the benefits of lazy fields when a postponed evaluation of values is needed for some reasons. I was wondering what was the behavior of lazy fields in terms of serialization.
Consider the following class.
class MyClass {
lazy val myLazyVal = {...}
...
}
Questions:
If an instance of MyClass is serialized, does the lazy field get serialized too ?
Does the behavior of serialization change if the field has been accessed or not before the serialization ? I mean, if I don't cause the evaluation of the field, is it considered as null ?
Does the serialization mechanism provoke an implicit evaluation of the lazy field ?
Is there a simple way to avoid the serialization of the variable and getting the value recomputed one more time lazily after the deserialization ? This should happen independently from the evaluation of the field.
Answers
Yes if field was already initialized, if not you can tread it as a method. Value is not computed -> not serialized, but available after de serialization.
If you didn't touch field it's serialized almost as it's a simple 'def' method, you don't need it's type to be serializable itself, it will be recalculated after de-serialization
No
You can add #transient before lazy val definition in my code example, as I understand it will do exactly what you want
Code to prove
object LazySerializationTest extends App {
def serialize(obj: Any): Array[Byte] = {
val bytes = new ByteArrayOutputStream()
val out = new ObjectOutputStream(bytes)
out.writeObject(obj)
out.close()
bytes.toByteArray
}
def deSerialise(bytes: Array[Byte]): MyClass = {
new ObjectInputStream(new ByteArrayInputStream(bytes)).
readObject().asInstanceOf[MyClass]
}
def test(obj: MyClass): Unit = {
val bytes = serialize(obj)
val fromBytes = deSerialise(bytes)
println(s"Original cnt = ${obj.x.cnt}")
println(s"De Serialized cnt = ${fromBytes.x.cnt}")
}
object X {
val cnt = new AtomicInteger()
}
class X {
// Not Serializable
val cnt = X.cnt.incrementAndGet
println(s"Create instance of X #$cnt")
}
class MyClass extends Serializable {
lazy val x = new X
}
// Not initialized
val mc1 = new MyClass
test(mc1)
// Force lazy evaluation
val mc2 = new MyClass
mc2.x
test(mc2) // Failed with NotSerializableException
}

How to create a Scala class with private field with public getter, and primary constructor taking a parameter of the same name

Search results so far have led me to believe this is impossible without either a non-primary constructor
class Foo { // NOT OK: 2 extra lines--doesn't leverage Scala's conciseness
private var _x = 0
def this(x: Int) { this(); _x = x }
def x = _x
}
val f = new Foo(x = 123) // OK: named parameter is 'x'
or sacrificing the name of the parameter in the primary constructor (making calls using named parameters ugly)
class Foo(private var _x: Int) { // OK: concise
def x = _x
}
val f = new Foo(_x = 123) // NOT OK: named parameter should be 'x' not '_x'
ideally, one could do something like this:
class Foo(private var x: Int) { // OK: concise
// make just the getter public
public x
}
val f = new Foo(x = 123) // OK: named parameter is 'x'
I know named parameters are a new thing in the Java world, so it's probably not that important to most, but coming from a language where named parameters are more popular (Python), this issue immediately pops up.
So my question is: is this possible? (probably not), and if not, why is such an (in my opinion) important use case left uncovered by the language design? By that, I mean that the code either has to sacrifice clean naming or concise definitions, which is a hallmark of Scala.
P.S. Consider the case where a public field needs suddenly to be made private, while keeping the getter public, in which case the developer has to change 1 line and add 3 lines to achieve the effect while keeping the interface identical:
class Foo(var x: Int) {} // no boilerplate
->
class Foo { // lots of boilerplate
private var _x: Int = 0
def this(x: Int) { this(); _x = x }
def x = _x
}
Whether this is indeed a design flaw is rather debatable. One would consider that complicating the syntax to allow this particular use case is not worthwhile.
Also, Scala is after all a predominantly functional language, so the presence of vars in your program should not be that frequent, again raising the question if this particular use case needs to be handled in a special way.
However, it seems that a simple solution to your problem would be to use an apply method in the companion object:
class Foo private(private var _x: Int) {
def x = _x
}
object Foo {
def apply(x: Int): Foo = new Foo(x)
}
Usage:
val f = Foo(x = 3)
println(f.x)
LATER EDIT:
Here is a solution similar to what you originally requested, but that changes the naming a bit:
class Foo(initialX: Int) {
private var _x = initialX
def x = _x
}
Usage:
val f = new Foo(initialX = 3)
The concept you are trying to express, which is an object whose state is mutable from within the object and yet immutable from the perspective of other objects ... that would probably be expressed as an Akka actor within the context of an actor system. Outside the context of an actor system, it would seem to be a Java conception of what it means to be an object, transplanted to Scala.
import akka.actor.Actor
class Foo(var x: Int) extends Actor {
import Foo._
def receive = {
case WhatIsX => sender ! x
}
}
object Foo {
object WhatIsX
}
Not sure about earlier versions, but In Scala 3 it can easily be implemented like follows:
// class with no argument constructor
class Foo {
// prive field
private var _x: Int = 0
// public getter
def x: Int = _x
// public setter
def x_=(newValue: Int): Unit =
_x = newValue
//auxiliary constructor
def this(value: Int) =
this()
_x = value
}
Note
Any definition within the primary constructor makes the definition public, unless you prepend it with private modifier
Append _= after a method name with Unit return type to make it a setter
Prepending a constructor parameter neither with val nor with var, makes it private
Then it follows:
val noArgFoo = Foo() // no argument case
println(noArgFoo.x) // the public getter prints 0
val withArgFoo = Foo(5) // with argument case
println(withArgFoo.x) // the public getter prints 5
noArgFoo.x = 100 // use the public setter to update x value
println(noArgFoo.x) // the public getter prints 100
withArgFoo.x = 1000 // use the public setter to update x value
println(withArgFoo.x) // the public getter prints 1000
This solution is exactly what you asked; in a principled way and without any ad hoc workaround e.g. using companion objects and the apply method.