Initialised class value evaluates to null in overridden method - scala

I'm looking for some insight into scala internals. We've just come out the other side of a painful debug session, and found out our problem was caused by a unexpected null value that we had thought would be pre-initialised. We can't fathom why that would be the case.
Here is an extremely cut down example of the code which illustrates the problem (if it looks convoluted it's because it's much more complicated in real code, but i've left the basic structure alone in case it's significant).
trait A {
println("in A")
def usefulMethod
def overrideThisMethod = {
//defaultImplementation
}
val stubbableFunction = {
//do some stuff
val stubbableMethod = overrideThisMethod
//do some other stuff with stubbableMethod
}
}
class B extends A {
println("in B")
def usefulMethod = {
//do something with stubbableFunction
}
}
class StubB extends B {
println("in StubB")
var usefulVar = "super useful" //<<---this is the val that ends up being null
override def overrideThisMethod {
println("usefulVar = " + usefulVar)
}
}
If we kick off the chain of initialisation, this is what is printed to the console:
scala> val stub = new StubB
in A
usefulVar = null
in B
in StubB
My assumptions
I assume that in order to instantiate StubB, first we instantiate trait A, and then B and finally StubB: hence the printing order of ("in A ", "in B", "in StubB"). I assume stubbableFunction in trait A is evaluated on initialisation because it's a val, same for stubbableMethod.
From here on is where i get confused.
My question
When val overrideThisMethod is evaluated in trait A, i would expect the classloader to follow the chain downwards to StubB (which it does, you can tell because of the printing of "usefulVal = null") but... why is the value null here? How can overrideThisMethod in StubB be evaluated without first initialising the StubB class and therefore setting usefulVal? I didnt know you could have "orphaned" methods being evaluated this way - surely methods have to belong to a class which has to be initialised before you can call the method?
We actually solved the problem by changing the val stubbableFunction = to def stubbableFunction = in trait A, but we'd still really like to understand what was going on here. I'm looking forward to learning something interesting about how Scala (or maybe Java) works under the hood :)
edit: I changed the null value to be var and the same thing happens - question updated for clarity in response to m-z's answer

I stripped down the original code even more leaving the original behavior intact. I also renamed some methods and vals to express the semantics better (mostly function vs value):
trait A {
println("in A")
def overridableComputation = {
println("A::overridableComputation")
1
}
val stubbableValue = overridableComputation
def stubbableMethod = overridableComputation
}
class StubB extends A {
println("in StubB")
val usefulVal = "super useful" //<<---this is the val that ends up being null
override def overridableComputation = {
println("StubB::overridableComputation")
println("usefulVal = " + usefulVal)
2
}
}
When run it yields the following output:
in A
StubB::overridableComputation
usefulVal = null
in StubB
super useful
Here are some Scala implementation details to help us understand what is happening:
the main constructor is intertwined with the class definition, i.e. most of the code (except method definitions) between curly braces is put into the constructor;
each val of the class is implemented as a private field and a getter method, both field and method are named after val (JavaBean convention is not adhered to);
the value for the val is computed within the constructor and is used to initialize the field.
As m-z already noted, the initialization runs top down, i.e. the parent's class or trait constructor is called first, the child's constructor is called last. So here's what happens when you call new StubB():
A StubB object is allocated in heap, all its fields are set to default values depending on their types (0, 0.0, null, etc);
A::A is invoked first as the top-most constructor;
"in A" is printed;
in order to compute the value for stubbableValue overridableComputation is called, the catch is in fact that the overridden method is called, i.e. StubB::overridableComputation see What's wrong with overridable method calls in constructors? for more details;
"StubB::overridableComputation" is printed;
since usefulVal is not yet initialized by StubB::StubB it's default value is used, so "usefulVal = null" is printed;
2 is returned;
stubbableValue is initialized with the computed value of 2;
StubB::StubB is invoked as the next constructor in chain;
"in StubB" is printed;
the value for usefulVar is computed, in this case just the literal "super useful" is used;
usefulVar is initialized with the value of "super useful".
Since the value for stubbableValue is computed during constructor run
To prove these assumptions fernflower Java decompiler can be used. Here's how the above Scala code looks when decompiled to Java (I removed irrelevant #ScalaSignature annotations):
import scala.collection.mutable.StringBuilder;
public class A {
private final int stubbableValue;
public int overridableComputation() {
.MODULE$.println("A::overridableComputation");
return 1;
}
public int stubbableValue() {
return this.stubbableValue;
}
public int stubbableMethod() {
return this.overridableComputation();
}
public A() {
.MODULE$.println("in A");
// Note, that overridden method is called below!
this.stubbableValue = this.overridableComputation();
}
}
public class StubB extends A {
private final String usefulVal;
public String usefulVal() {
return this.usefulVal;
}
public int overridableComputation() {
.MODULE$.println("StubB::overridableComputation");
.MODULE$.println(
(new StringBuilder()).append("usefulVal = ")
.append(this.usefulVal())
.toString()
);
return 2;
}
public StubB() {
.MODULE$.println("in StubB");
this.usefulVal = "super useful";
}
}
In case A is a trait instead of a class the code is a bit more verbose, but behavior is consistent with the class A variant. Since JVM doesn't support multiple inheritance Scala compiler splits a trait into a abstract helper class which only contains static members and an interface:
import scala.collection.mutable.StringBuilder;
public abstract class A$class {
public static int overridableComputation(A $this) {
.MODULE$.println("A::overridableComputation");
return 1;
}
public static int stubbableMethod(A $this) {
return $this.overridableComputation();
}
public static void $init$(A $this) {
.MODULE$.println("in A");
$this.so32501595$A$_setter_$stubbableValue_$eq($this.overridableComputation());
}
}
public interface A {
void so32501595$A$_setter_$stubbableValue_$eq(int var1);
int overridableComputation();
int stubbableValue();
int stubbableMethod();
}
public class StubB implements A {
private final String usefulVal;
private final int stubbableValue;
public int stubbableValue() {
return this.stubbableValue;
}
public void so32501595$A$_setter_$stubbableValue_$eq(int x$1) {
this.stubbableValue = x$1;
}
public String usefulVal() {
return this.usefulVal;
}
public int overridableComputation() {
.MODULE$.println("StubB::overridableComputation");
.MODULE$.println(
(new StringBuilder()).append("usefulVal = ")
.append(this.usefulVal())
.toString()
);
return 2;
}
public StubB() {
A$class.$init$(this);
.MODULE$.println("in StubB");
this.usefulVal = "super useful";
}
}
Remember that a val is rendered into a field and a method? Since several traits can be mixed into a single class, a trait cannot be implemented as a class. Therefore, the method part of a val is put into an interface, while a field part is put into the class that a trait gets mixed into.
The abstract class contains the code of all the trait's methods, access to the member fields is provided by passing $this explicitly.

Related

How to qualify methods as static in Scala?

I have a class
class MyClass {
def apply(myRDD: RDD[String]) {
val rdd2 = myRDD.map(myString => {
// do String manipulation
}
}
}
object MyClass {
}
Since I have a block of code performing one task (the area that says "do String manipulation"), I thought I should break it out into its own method. Since the method is not changing the state of the class, I thought I should make it a static method.
How do I do that?
I thought that you can just pop a method inside the companion object and it would be available as a static class, like this:
object MyClass {
def doStringManipulation(myString: String) = {
// do String manipulation
}
}
but when I try val rdd2 = myRDD.map(myString => { doStringManipulation(myString)}), scala doesn't recognize the method and it forces me to do MyClass.doStringManipulation(myString) in order to call it.
What am I doing wrong?
In Scala there are no static methods: all methods are defined over an object, be it an instance of a class or a singleton, as the one you defined in your question.
As you correctly pointed out, by having a class and an object named in the same way in the same compilation unit you make the object a companion of the class, which means that the two have access to each others' private fields and methods, but this does not mean they are available without specifying which object you are accessing.
What you want to do is either using the long form as mentioned (MyClass.doStringManipulation(myString)) or, if you think it makes sense, you can just import the method in the class' scope, as follows:
import MyClass.doStringManipulation
class MyClass {
def apply(myRDD: RDD[String]): Unit = {
val rdd2 = myRDD.map(doStringManipulation)
}
}
object MyClass {
private def doStringManipulation(myString: String): String = {
???
}
}
As a side note, for the MyClass.apply method, you used the a notation which is going to disappear in the future:
// this is a shorthand for a method that returns `Unit` but is going to disappear
def method(parameter: Type) {
// does things
}
// this means the same, but it's going to stay
// the `=` is enough, even without the explicit return type
// unless, that is, you want to force the method to discard the last value and return `Unit`
def method(parameter: Type): Unit = {
// does things
}
You should follow scala's advice.
val rdd2 = myRDD.map(MyClass.doStringManipulation)
Write this inside the class then it will work as expected.
import MyClass._

Why can't I use this.getClass in auxiliary constructor in scala?

Why can't I use this.getClass in auxiliary constructor in scala? Are there any alternatives?
More specifically, I am trying to call LoggerFactory.getLogger of slf4j in the auxiliary constructor. I have an hack now where I am forced to pass a logger object to the constructor.
A simple contrived example (does not compile) which shows what I am trying to do:
class A (numbers : Double) {
val logger = LoggerFactory.getLogger(this.getClass)
def this(numbersAsStr: String) = this (try { s.toDouble) } catch { case _ => LoggerFactory.getLogger(this.getClass).error("Failed to convert"); 0 }
}
This is actually a limitation of the JVM rather than specifically a Scala problem. Here's a similar example in Java:
public class ThisTest {
public final String name;
public ThisTest(String n) {
name = n;
}
public ThisTest() {
// trying to use `this` in a call to the primary constructor
this(this.getClass().getName());
}
}
When you try to compile it you get an error:
$ javac ThisTest.java
ThisTest.java:10: error: cannot reference this before supertype constructor has been called
this(this.getClass().getName());
^
1 error
The problem is that you're trying to reference this before this any of the super-constructors for this have been called. You will have the restriction that you can't use a this reference in a super() or this() call no matter what JVM language you use, because that's the way classes work on the JVM.
However, you can totally avoid this problem by restructuring your code to put the reference to this after the this() call:
class A (numbers: Double) {
val logger = LoggerFactory.getLogger(this.getClass)
def this(numbersAsStr: String) = {
this ( try { numbersAsStr.toDouble } catch { case _ => 0 } )
LoggerFactory.getLogger(this.getClass).error("Failed to convert");
}
}
You might actually want access to the thrown exception for your log info. In that case, I'd just use LoggerFactory.getLogger(classOf[A]). That won't give you the actual class name if you're using inheritance (which I was assuming was the case here), but if you include the stack trace in the log then you should be able to figure it out.
Not sure I understand the question. Here is a guess:
class Foo(val c: Class[_]) {
def this() = this(classOf[Foo])
}
new Foo().c // -> class Foo

A deeper explanation of why Lazy Vals work in scala constructors?

I understand the general use of Lazy vals to get around initialization order problems in scala, but something has always bothered me about this explanation. If a "Lazy Val" is initialized during it's first access, and the parent constructor is making use of it BEFORE it could possibly exist - what exactly is going on here? In the below example, when "println("A: " + x1)" is called - Class B doesn't exist yet.. but the value does correctly print. At the exact moment we see "A: Hello" - did this happen in the constructor of A, or delayed somehow until B fully existed? In a sense, marking it "Lazy" has counter-intuitively made it available ahead of schedule?
Thank you
(referenced from https://github.com/paulp/scala-faq/wiki/Initialization-Order)
abstract class A {
val x1: String
println("A: " + x1)
}
class B extends A {
lazy val x1: String = "hello"
}
The object itself doesn't exist, but fields within the object can exist and be calculated.
What's happening is that from within A's constructor, it's accessing x1 and therefore forcing the lazy value to be computed. The reason A can know it needs to call B's x1 method, is because it's dynamically dispatched (just like in Java).
If it helps, the stack would be something similar to this:
B.x1$lzycompute
B.x1
A.<init>
B.<init>
If it helps, here is a rough version of your code in Java:
public class Testing {
public static void main(String[] args) {
new B();
}
public static abstract class A {
public abstract String x1();
public A() {
System.out.println(x1());
}
}
public static class B extends A {
private boolean inited = false;
private String x1;
private String computeX1() {
x1 = "hello";
inited = true;
return x1;
}
public String x1() {
return this.inited ? x1 : computeX1();
}
}
}
The "BEFORE" relation just refers to the order that initializers are run.
When you allocate an object on the heap, you simply allocate it and then call the init methods to init it.
There's no sense in which there is an instance of parent A that precedes an instance of child B.
They are the same object, seen as the parts of its type.
It's not like when people tell me I look like my father (who is not me).
Anyway, fragility ensues if it's not laziness all the way down:
abstract class A {
val x1: String
val x2: String
println("A: " + x1)
println("A2: " + x2)
}
class B extends A {
lazy val x1: String = "hello"
lazy val x2: String = x3
val x3: String = "bye"
}
object Test extends App {
val b = new B
Console println (b.x1,b.x2,b.x3)
}
With the result:
A: hello
A2: null
(hello,null,bye)
That's why the general advice is to use defs instead of vals and, for that matter, traits instead of classes (to ensure since with traits you are more likely to have heard of and followed the first rule).

Why scala uses reflection to call method on structural type?

If function accepts structural type, it can be defined as:
def doTheThings(duck: { def walk; def quack }) { duck.quack }
or
type DuckType = { def walk; def quack }
def doTheThings(duck: DuckType) { duck.quack }
Then, you can use that function in following way:
class Dog {
def walk { println("Dog walk") }
def quack { println("Dog quacks") }
}
def main(args: Array[String]) {
doTheThings(new Dog);
}
If you decompile (to Java) the classes generated by scalac for my example, you can see that argument of doTheThings is of type Object and the implementation uses reflection to call methods on the argument (i.e.duck.quack)
My question is why reflection? Isn't it possible just to use anonymous and invokevirtual instead of reflection?
Here is way to translate(implement) the structural type calls for my example (Java syntax, but the point is the bytecode):
class DuckyDogTest {
interface DuckType {
void walk();
void quack();
}
static void doTheThing(DuckType d) {
d.quack();
}
static class Dog {
public void walk() { System.out.println("Dog walk"); }
public void quack() { System.out.println("Dog quack"); }
}
public static void main(String[] args) {
final Dog d = new Dog();
doTheThing(new DuckType() {
public final void walk() { d.walk(); }
public final void quack() { d.quack();}
});
}
}
Consider a simple proposition:
type T = { def quack(): Unit; def walk(): Unit }
def f(a: T, b: T) =
if (a eq b) println("They are the same duck!")
else println("Different ducks")
f(x, x) // x is a duck
It would print Different ducks under your proposal. You could further refine it, but you just cannot keep referential equality intact using a proxy.
A possible solution would be to use the type class pattern, but that would require passing another parameter (even if implicit). Still, it's faster. But that's mostly because of the lameness of Java's reflection speed. Hopefully, method handles will get around the speed problem. Unfortunately, Scala is not scheduled to give up on Java 5, 6 and 7 (which do not have method handles) for some time...
In addition to your proxy object implementing methods on the structural type, it would also need to have appropriate pass-through implementations of all of the methods on Any (equals, hashCode, toString, isInstanceOf, asInstanceOf) and AnyRef(getClass, wait, notify, notifyAll, and synchronized). While some of these would be straightforward, some would be almost impossible to get right. In particular, all of the methods listed are "final" on AnyRef (for Java compatability and security) and so couldn't be properly implemented by your proxy object.

How to implement intermediate types for implicit methods?

Assume I want to offer method foo on existing type A outside of my control. As far as I know, the canonical way to do this in Scala is implementing an implicit conversion from A to some type that implements foo. Now I basically see two options.
Define a separate, maybe even hidden class for the purpose:
protected class Fooable(a : A) {
def foo(...) = { ... }
}
implicit def a2fooable(a : A) = new Fooable(a)
Define an anonymous class inline:
implicit def a2fooable(a : A) = new { def foo(...) = { ... } }
Variant 2) is certainly less boilerplate, especially when lots of type parameters happen. On the other hand, I think it should create more overhead since (conceptually) one class per conversion is created, as opposed to one class globally in 1).
Is there a general guideline? Is there no difference, because compiler/VM get rid of the overhead of 2)?
Using a separate class is better for performance, as the alternative uses reflection.
Consider that
new { def foo(...) = { ... } }
is really
new AnyRef { def foo(...) = { ... } }
Now, AnyRef doesn't have a method foo. In Scala, this type is actually AnyRef { def foo(...): ... }, which, if you remove AnyRef, you should recognize as a structural type.
At compile time, this time can be passed back and forth, and everywhere it will be known that the method foo is callable. However, there's no structural type in the JVM, and to add an interface would require a proxy object, which would cause some problems such as breaking referential equality (ie, an object would not be equal with a structural type version of itself).
The way found around that was to use cached reflection calls for structural types.
So, if you want to use the Pimp My Library pattern for any performance-sensitive application, declare a class.
I believe 1 and 2 get compiled to the same bytecode (except for the class name that gets generated in case 2).
If Fooable exists only for you to be able to convert implicitly A to Fooable (and you're never going to directly create and use a Fooable), then I would go with option 2.
However, if you control A (meaning A is not a java library class that you can't subclass) I would consider using a trait instead of implicit conversions to add behaviour to A.
UPDATE:
I have to reconsider my answer. I would use variant 1 of your code, because variant 2 turns out to be using reflection (scala 2.8.1 on Linux).
I compiled these two versions of the same code, decompiled them to java with jd-gui and here are the results:
source code with named class
class NamedClass { def Foo : String = "foo" }
object test {
implicit def StrToFooable(a: String) = new NamedClass
def main(args: Array[String]) { println("bar".Foo) }
}
source code with anonymous class
object test {
implicit def StrToFooable(a: String) = new { def Foo : String = "foo" }
def main(args: Array[String]) { println("bar".Foo) }
}
compiled and decompiled to java with java-gui. The "named" version generates a NamedClass.class that gets decompiled to this java:
public class NamedClass
implements ScalaObject
{
public String Foo()
{
return "foo";
}
}
the anonymous generates a test$$anon$1 class that gets decompiled to the following java
public final class test$$anon$1
{
public String Foo()
{
return "foo";
}
}
so almost identical, except for the anonymous being "final" (they apparently want to make extra sure you won't get out of your way to try and subclass an anonymous class...)
however at the call site I get this java for the "named" version
public void main(String[] args)
{
Predef..MODULE$.println(StrToFooable("bar").Foo());
}
and this for the anonymous
public void main(String[] args) {
Object qual1 = StrToFooable("bar"); Object exceptionResult1 = null;
try {
exceptionResult1 = reflMethod$Method1(qual1.getClass()).invoke(qual1, new Object[0]);
Predef..MODULE$.println((String)exceptionResult1);
return;
} catch (InvocationTargetException localInvocationTargetException) {
throw localInvocationTargetException.getCause();
}
}
I googled a little and found that others have reported the same thing but I haven't found any more insight as to why this is the case.