scala: construction order and early definitions (inheritance) - scala

I am reading "Scala for the Impatient" and in 8.10 there is an example:
class Animal {
val range: Int = 10
val env: Array[Int] = new Array[Int](range)
}
class Ant extends Animal {
override val range: Int = 2
}
The author explains why the env ends up being an empty Array[Int]:
[..]
3. The Animal constructor, in order to initialize the env array, calls the range() getter.
That method is overridden to yield the (as yet uninitialized) range field of the Ant class.
The range method returns 0. (That is the initial value of all integer fields when an object is allocated.)
env is set to an array of length 0.
The Ant constructor continues, setting its range field to 2.[..]
I don't understand the 4th and therefore the next steps are also not clear. range() method is overridden with 2, so why doesn't it set the range already in the 4th step?
Doe is work this way? When the val is overridden, then it gets uninitialised and also all val's that include this overridden gals get modified as well. Is it correct? If yes, why there is a different behaviour with def as outlined here. Why is def defined before the constructor call and val after?

After your comment, I decided to actually look at what exactly is written in the book. After reading the explanation, I decided that I cannot express it any clearer. So instead, I propose to take a look at the completely desugared code, for which there was no place in the short book.
Save this as a Scala script:
class Animal {
val range: Int = 10
val env: Array[Int] = new Array[Int](range)
}
class Ant extends Animal {
override val range: Int = 2
}
val ant = new Ant
println(ant.range)
println(ant.env.size)
and then run it using the -print-option:
> scala -nc -print yourScript.scala
you should see something like this:
class anon$1$Animal extends Object {
private[this] val range: Int = _;
<stable> <accessor> def range(): Int = anon$1$Animal.this.range;
private[this] val env: Array[Int] = _;
<stable> <accessor> def env(): Array[Int] = anon$1$Animal.this.env;
<synthetic> <paramaccessor> <artifact> protected val $outer: <$anon: Object> = _;
<synthetic> <stable> <artifact> def $outer(): <$anon: Object> = anon$1$Animal.this.$outer;
def <init>($outer: <$anon: Object>): <$anon: Object> = {
if ($outer.eq(null))
throw null
else
anon$1$Animal.this.$outer = $outer;
anon$1$Animal.super.<init>();
anon$1$Animal.this.range = 10;
anon$1$Animal.this.env = new Array[Int](anon$1$Animal.this.range());
()
}
};
class anon$1$Ant extends <$anon: Object> {
private[this] val range: Int = _;
override <stable> <accessor> def range(): Int = anon$1$Ant.this.range;
<synthetic> <stable> <artifact> def $outer(): <$anon: Object> = anon$1$Ant.this.$outer;
def <init>($outer: <$anon: Object>): <$anon: anon$1$Animal> = {
anon$1$Ant.super.<init>($outer);
anon$1$Ant.this.range = 2;
()
}
}
This is the desugared code as it is seen by the compiler in later stages of compilation. It's a bit hard to read, but what it is important are these declarations:
// in Animal:
private[this] val range: Int = _;
<stable> <accessor> def range(): Int = anon$1$Animal.this.range;
// in Ant:
private[this] val range: Int = _;
override <stable> <accessor> def range(): Int =
anon$1$Ant.this.range;
and also the statement in the initializer of Animal:
anon$1$Animal.this.env = new Array[Int](anon$1$Animal.this.range())
What you can see here is that there are actually two different variables range: one is Animal.this.range and the other is Ant.this.range. Moreover, there are completely separate defs which are also called range in the desugared code: these are the getters which are generated automatically for vals.
The first variable is indeed initialized in Animal and set to 10:
anon$1$Animal.this.range = 10;
However, this does not matter, because the env is initialized using the getter range(), which is overridden to return Ant.this.range. The variable Ant.this.range is assigned the value 2 once, but after the initializer of Animal has completed. During the initialization of Animal, the variable Ant.this.range holds the default value 0, hence the counter-intuitive result.
If you simplify the desugared code a little bit, you obtain a compilable and readable example that behaves in the same way:
class Animal {
private[this] var _Animal_range: Int = 0
def range: Int = _Animal_range
_Animal_range = 10
val env: Array[Int] = new Array[Int](range)
}
class Ant extends Animal {
private[this] var _Ant_range: Int = 0
override def range: Int = _Ant_range
_Ant_range = 2
}
val ant = new Ant
println(ant.range)
println(ant.env.size)
Here, the same happens:
_Animal_range is allocated with default value 0
_Ant_range is allocated with default value 0
Animal base class begins initialization
_Animal_range is initialized with value 10
To initialize env, the getter range is invoked. It is overridden in the Ant-class, and returns _Ant_range, which is still 0
env is set to an empty array
Animal base class finishes initialization
Ant begins initialization
Only now does it set _Ant_range to 2.
This is why both code snippets print 2 and 0.
Hope that helps.

defs are called when asked for, but vals are kept in memory, so in this case the val version is still set to zero because it hasn’t been initialised, where the def version is called and therefore can never give a misleading value.
It’s only because the val here is overloaded that it isn’t initialised in time, because classes must be initialised starting at the top of the inheritance hierarchy and building down. If env were a def you’d be fine also, because it wouldn’t be created until called, by which time the vals would all be initialised. Of course this way you’d get different lists every time you called env so you might prefer to use lazy val, which is initialised the first time it’s called, but then remains the same and is kept in memory.

Related

Spark serialization error mystery

Let's say i have the following code:
class Context {
def compute() = Array(1.0)
}
val ctx = new Context
val data = ctx.compute
Now we are running this code in Spark:
val rdd = sc.parallelize(List(1,2,3))
rdd.map(_ + data(0)).count()
The code above throws org.apache.spark.SparkException: Task not serializable. I'm not asking how to fix it, by extending Serializable or making a case class, i want to understand why the error happens.
The thing that i don't understand is why it complains about Context class not being a Serializable, though it's not a part of the lambda: rdd.map(_ + data(0)). data here is an Array of values which should be serialized, but it seems that JVM also captures ctx reference as well, which, in my understanding, should not happening.
As i understand, in the shell Spark should clear lambda from the repl context. If we print the tree after delambdafy phase, we would see these pieces:
object iw extends Object {
...
private[this] val ctx: $line11.iw$Context = _;
<stable> <accessor> def ctx(): $line11.iw$Context = iw.this.ctx;
private[this] val data: Array[Double] = _;
<stable> <accessor> def data(): Array[Double] = iw.this.data;
...
}
class anonfun$1 ... {
final def apply(x$1: Int): Double = anonfun$1.this.apply$mcDI$sp(x$1);
<specialized> def apply$mcDI$sp(x$1: Int): Double = x$1.+(iw.this.data().apply(0));
...
}
So the decompiled lambda code that is sent to the worker node is: x$1.+(iw.this.data().apply(0)). Part iw.this belongs to the Spark-Shell session, so, as i understand, it should be cleared by the ClosureCleaner, since has nothing to do with the logic and shouldn't be serialized. Anyway, calling iw.this.data() returns an Array[Double] value of the data variable, which is initialized in the constructor:
def <init>(): type = {
iw.super.<init>();
iw.this.ctx = new $line11.iw$Context();
iw.this.data = iw.this.ctx().compute(); // <== here
iw.this.res4 = ...
()
}
In my understanding ctx value has nothing to do with the lambda, it's not a closure, hence shouldn't be serialized. What am i missing or misunderstanding?
This has to do with what Spark considers it can use as a closure safely. This is in some cases very intuitive, since Spark uses reflection and in many cases can't recognize some of Scala's guarantees (not a full compiler or anything) or the fact that some variables in the same object are irrelevant. For safety, Spark will attempt to serialize any objects referenced, which in your case includes iw, which is not serializable.
The code inside ClosureCleaner has a good example:
For instance, transitive cleaning is necessary in the following
scenario:
class SomethingNotSerializable {
def someValue = 1
def scope(name: String)(body: => Unit) = body
def someMethod(): Unit = scope("one") {
def x = someValue
def y = 2
scope("two") { println(y + 1) }
}
}
In this example, scope "two" is not serializable because it references scope "one", which references SomethingNotSerializable. Note that, however, the body of scope "two" does not actually depend on SomethingNotSerializable. This means we can safely null out the parent pointer of a cloned scope "one" and set it the parent of scope "two", such that scope "two" no longer references SomethingNotSerializable transitively.
Probably the easiest fix is to create a local variable, in the same scope, that extracts the value from your object, such that there is no longer any reference to the encapsulating object inside the lambda:
val rdd = sc.parallelize(List(1,2,3))
val data0 = data(0)
rdd.map(_ + data0).count()

How to create a Scala class with private field with public getter, and primary constructor taking a parameter of the same name

Search results so far have led me to believe this is impossible without either a non-primary constructor
class Foo { // NOT OK: 2 extra lines--doesn't leverage Scala's conciseness
private var _x = 0
def this(x: Int) { this(); _x = x }
def x = _x
}
val f = new Foo(x = 123) // OK: named parameter is 'x'
or sacrificing the name of the parameter in the primary constructor (making calls using named parameters ugly)
class Foo(private var _x: Int) { // OK: concise
def x = _x
}
val f = new Foo(_x = 123) // NOT OK: named parameter should be 'x' not '_x'
ideally, one could do something like this:
class Foo(private var x: Int) { // OK: concise
// make just the getter public
public x
}
val f = new Foo(x = 123) // OK: named parameter is 'x'
I know named parameters are a new thing in the Java world, so it's probably not that important to most, but coming from a language where named parameters are more popular (Python), this issue immediately pops up.
So my question is: is this possible? (probably not), and if not, why is such an (in my opinion) important use case left uncovered by the language design? By that, I mean that the code either has to sacrifice clean naming or concise definitions, which is a hallmark of Scala.
P.S. Consider the case where a public field needs suddenly to be made private, while keeping the getter public, in which case the developer has to change 1 line and add 3 lines to achieve the effect while keeping the interface identical:
class Foo(var x: Int) {} // no boilerplate
->
class Foo { // lots of boilerplate
private var _x: Int = 0
def this(x: Int) { this(); _x = x }
def x = _x
}
Whether this is indeed a design flaw is rather debatable. One would consider that complicating the syntax to allow this particular use case is not worthwhile.
Also, Scala is after all a predominantly functional language, so the presence of vars in your program should not be that frequent, again raising the question if this particular use case needs to be handled in a special way.
However, it seems that a simple solution to your problem would be to use an apply method in the companion object:
class Foo private(private var _x: Int) {
def x = _x
}
object Foo {
def apply(x: Int): Foo = new Foo(x)
}
Usage:
val f = Foo(x = 3)
println(f.x)
LATER EDIT:
Here is a solution similar to what you originally requested, but that changes the naming a bit:
class Foo(initialX: Int) {
private var _x = initialX
def x = _x
}
Usage:
val f = new Foo(initialX = 3)
The concept you are trying to express, which is an object whose state is mutable from within the object and yet immutable from the perspective of other objects ... that would probably be expressed as an Akka actor within the context of an actor system. Outside the context of an actor system, it would seem to be a Java conception of what it means to be an object, transplanted to Scala.
import akka.actor.Actor
class Foo(var x: Int) extends Actor {
import Foo._
def receive = {
case WhatIsX => sender ! x
}
}
object Foo {
object WhatIsX
}
Not sure about earlier versions, but In Scala 3 it can easily be implemented like follows:
// class with no argument constructor
class Foo {
// prive field
private var _x: Int = 0
// public getter
def x: Int = _x
// public setter
def x_=(newValue: Int): Unit =
_x = newValue
//auxiliary constructor
def this(value: Int) =
this()
_x = value
}
Note
Any definition within the primary constructor makes the definition public, unless you prepend it with private modifier
Append _= after a method name with Unit return type to make it a setter
Prepending a constructor parameter neither with val nor with var, makes it private
Then it follows:
val noArgFoo = Foo() // no argument case
println(noArgFoo.x) // the public getter prints 0
val withArgFoo = Foo(5) // with argument case
println(withArgFoo.x) // the public getter prints 5
noArgFoo.x = 100 // use the public setter to update x value
println(noArgFoo.x) // the public getter prints 100
withArgFoo.x = 1000 // use the public setter to update x value
println(withArgFoo.x) // the public getter prints 1000
This solution is exactly what you asked; in a principled way and without any ad hoc workaround e.g. using companion objects and the apply method.

Trouble using Implicit Ordered with PriorityQueue (Scala)

I'm trying to create a data structure that has a PriorityQueue in it. I've succeeded in making a non-generic version of it. I can tell it works because it solves the A.I. problem I have.
Here is a snippet of it:
class ProntoPriorityQueue { //TODO make generic
implicit def orderedNode(node: Node): Ordered[Node] = new Ordered[Node] {
def compare(other: Node) = node.compare(other)
}
val hashSet = new HashSet[Node]
val priorityQueue = new PriorityQueue[Node]()
...
I'm trying to make it generic, but if I use this version it stops solving the problem:
class PQ[T <% Ordered[T]] {
//[T]()(implicit val ord: T => Ordered[T]) {
//[T]()(implicit val ord: Ordering[T] {
val hashSet = new HashSet[T]
val priorityQueue = new PriorityQueue[T]
...
I've also tried what's commented out instead of using [T <% Ordered[T]]
Here is the code that calls PQ:
//the following def is commented out while using ProntoPriorityQueue
implicit def orderedNode(node: Node): Ordered[Node] = new Ordered[Node] {
def compare(other: Node) = node.compare(other)
} //I've also tried making this return an Ordering[Node]
val frontier = new PQ[Node] //new ProntoPriorityQueue
//have also tried (not together):
val frontier = new PQ[Node]()(orderedNode)
I've also tried moving the implicit def into the Node object (and importing it), but essentially the same problem.
What am I doing wrong in the generic version? Where should I put the implicit?
Solution
The problem was not with my implicit definition. The problem was the implicit ordering was being picked up by a Set that was automatically generating in a for(...) yield(...) statement. This caused a problem where the yielded set only contained one state.
What's wrong with simply defining an Ordering on your Node (Ordering[Node]) and using the already-generic Scala PriorityQueue?
As general rule, it's better to work with Ordering[T] than T <: Ordered[T] or T <% Ordered[T]. Conceptually, Ordered[T] is an intrinsic (inherited or implemented) property of the type itself. Notably, a type can have only one intrinsic ordering relationship defined this way. Ordering[T] is an external specification of the ordering relationship. There can any be any number of different Ordering[T].
Also, if you're not already aware, you should know that the difference between T <: U and T <% U is that while the former includes only nominal subtype relations (actual inheritance), the latter also includes the application of implicit conversions that yield a value conforming to the type bound.
So if you want to use Node <% Ordered[Node] and you don't have a compare method defined in the class, an implicit conversion will be applied every time a comparison needs to be made. Additionally, if your type has its own compare, the implicit conversion will never be applied and you'll be stuck with that "built-in" ordering.
Addendum
I'll give a few examples based on a class, call it CIString that simply encapsulates a String and implements ordering as case-invariant.
/* Here's how it would be with direct implementation of `Ordered` */
class CIString1(val s: String)
extends Ordered[CIString1]
{
private val lowerS = s.toLowerCase
def compare(other: CIString1) = lowerS.compareTo(other.lowerS)
}
/* An uninteresting, empty ordered set of CIString1
(fails without the `extends` clause) */
val os1 = TreeSet[CIString1]()
/* Here's how it would look with ordering external to `CIString2`
using an implicit conversion to `Ordered` */
class CIString2(val s: String) {
val lowerS = s.toLowerCase
}
class CIString2O(ciS: CIString2)
extends Ordered[CIString2]
{
def compare(other: CIString2) = ciS.lowerS.compareTo(other.lowerS)
}
implicit def cis2ciso(ciS: CIString2) = new CIString2O(ciS)
/* An uninteresting, empty ordered set of CIString2
(fails without the implicit conversion) */
val os2 = TreeSet[CIString2]()
/* Here's how it would look with ordering external to `CIString3`
using an `Ordering` */
class CIString3(val s: String) {
val lowerS = s.toLowerCase
}
/* The implicit object could be replaced by
a class and an implicit val of that class */
implicit
object CIString3Ordering
extends Ordering[CIString3]
{
def compare(a: CIString3, b: CIString3): Int = a.lowerS.compareTo(b.lowerS)
}
/* An uninteresting, empty ordered set of CIString3
(fails without the implicit object) */
val os3 = TreeSet[CIString3]()
Well, one possible problem is that your Ordered[Node] is not a Node:
implicit def orderedNode(node: Node): Ordered[Node] = new Ordered[Node] {
def compare(other: Node) = node.compare(other)
}
I'd try with an Ordering[Node] instead, which you say you tried but there isn't much more information about. PQ would be declared as PQ[T : Ordering].

Scala - initialization order of vals

I have this piece of code that loads Properties from a file:
class Config {
val properties: Properties = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
val forumId = properties.get("forum_id")
}
This seems to be working fine.
I have tried moving the initialization of properties into another val, loadedProperties, like this:
class Config {
val properties: Properties = loadedProps
val forumId = properties.get("forum_id")
private val loadedProps = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
}
But it doesn't work! (properties is null in properties.get("forum_id") ).
Why would that be? Isn't loadedProps evaluated when referenced by properties?
Secondly, is this a good way to initialize variables that require non-trivial processing? In Java, I would declare them final fields, and do the initialization-related operations in the constructor.
Is there a pattern for this scenario in Scala?
Thank you!
Vals are initialized in the order they are declared (well, precisely, non-lazy vals are), so properties is getting initialized before loadedProps. Or in other words, loadedProps is still null when propertiesis getting initialized.
The simplest solution here is to define loadedProps before properties:
class Config {
private val loadedProps = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
val properties: Properties = loadedProps
val forumId = properties.get("forum_id")
}
You could also make loadedProps lazy, meaning that it will be initialized on its first access:
class Config {
val properties: Properties = loadedProps
val forumId = properties.get("forum_id")
private lazy val loadedProps = {
val p = new Properties()
p.load(Thread.currentThread().getContextClassLoader.getResourceAsStream("props"))
p
}
}
Using lazy val has the advantage that your code is more robust to refactoring, as merely changing the declaration order of your vals won't break your code.
Also in this particular occurence, you can just turn loadedProps into a def (as suggested by #NIA) as it is only used once anyway.
I think here loadedProps can be simply turned into a function by simply replacing val with def:
private def loadedProps = {
// Tons of code
}
In this case you are sure that it is called when you call it.
But not sure is it a pattern for this case.
Just an addition with a little more explanation:
Your properties field initializes earlier than loadedProps field here. null is a field's value before initialization - that's why you get it. In def case it's just a method call instead of accessing some field, so everything is fine (as method's code may be called several times - no initialization here). See, http://docs.scala-lang.org/tutorials/FAQ/initialization-order.html. You may use def or lazy val to fix it
Why def is so different? That's because def may be called several times, but val - only once (so its first and only one call is actually initialization of the fileld).
lazy val can initialize only when you call it, so it also would help.
Another, simpler example of what's going on:
scala> class A {val a = b; val b = 5}
<console>:7: warning: Reference to uninitialized value b
class A {val a = b; val b = 5}
^
defined class A
scala> (new A).a
res2: Int = 0 //null
Talking more generally, theoretically scala could analize the dependency graph between fields (which field needs other field) and start initialization from final nodes. But in practice every module is compiled separately and compiler might not even know those dependencies (it might be even Java, which calls Scala, which calls Java), so it's just do sequential initialization.
So, because of that, it couldn't even detect simple loops:
scala> class A {val a: Int = b; val b: Int = a}
<console>:7: warning: Reference to uninitialized value b
class A {val a: Int = b; val b: Int = a}
^
defined class A
scala> (new A).a
res4: Int = 0
scala> class A {lazy val a: Int = b; lazy val b: Int = a}
defined class A
scala> (new A).a
java.lang.StackOverflowError
Actually, such loop (inside one module) can be theoretically detected in separate build, but it won't help much as it's pretty obvious.

Scala Numeric init with constant 0

Lets I have a utility class called MathUtil.
and it looks like this .
abstract class MathUtil(T:Numeric){
def nextNumber(value:T)
def result():T
}
Lets I subclass it this way
class SumUtil[T:Numeric] extends MathUtil[T]{
private var sum:T = 0
override def nextNumber(value:T){
sum = sum + value
}
override def result():T = sum
}
I have a problem with the statement
private var sum:T = 0
Now , I have to initialize to sum to 0. I would guess any numeric to have a way to represent 0. Im pretty new to scala. How do I solve this issue ?
The Numeric type class instance has a zero method that does what you want:
class SumUtil[T: Numeric] extends MathUtil[T] {
private var sum: T = implicitly[Numeric[T]].zero
override def nextNumber(value: T) {
sum = implicitly[Numeric[T]].plus(sum, value)
}
override def result(): T = sum
}
Note that you also need the instance for the plus method, unless you import Numeric.Implicits._, in which case you can use +. You can also clean the code up a bit by not using the context bound syntax in this case:
class SumUtil[T](implicit ev: Numeric[T]) extends MathUtil[T] {
import Numeric.Implicits._
private var sum: T = ev.zero
override def nextNumber(value: T) {
sum = sum + value
}
override def result(): T = sum
}
This is exactly equivalent: the context bound version is just syntactic sugar for this implicit argument, but if you need to use that argument explicitly (as you do here, for its zero), I find it cleaner to write the desugared version.
I think that there needs to be a little clarification of exactly what you're trying to accomplish. From the Scala docs, the Numeric type itself is generic. My feeling here is what you actually want is to describe a MathUtil abstraction that handles any Numeric[T] rather than subclasses of Numeric[_] which is what your code is currently describing. Here is the correct implementation based on that assumption.
//Define a MathUtil that works on any T
abstract class MathUtil[T] {
def nextNumber(value: T)
def result(): T
}
//Define a SumUtil that works on any T that has an available Numeric
//Will search implicit scope, but also allows you to provide an
//implementation if desired.
class SumUtil[T](implicit n: Numeric[T]) extends MathUtil[T] {
//Use the Numeric to generate the zero correctly.
private var sum: T = n.zero
//Use the Numeric to correctly add the sum and value
override def nextNumber(value: T) = sum = n.plus(sum, value)
override def result(): T = sum
}
//Test that it works.
val a = new SumUtil[Int]
val b = List(1,2,3)
b map a.nextNumber //Quick and dirty test... returns a meaningless list
println(a.result) //Does indeed print 6
If the above doesn't do what you want, please clarify your question.