Inheritance and initialization in Scala - scala

I have two Scala classes that look like this (paraphrased):
abstract class GenericParser[T] {
val lineFilter : String => Boolean
parseData()
def parseData() : T {
for( line <- .... if lineFilter(line) )
// do things
}
}
class SalesParser extends GenericParser[SalesRow] {
val lineFilter = line => !line.startsWith("//")
// ....
}
The problem is that lineFilter is null in parseData, presumably because parseData is called while the primary GenericParser constructor is still running, so the subclass hasn't fully initialized its members.
I can work around this by making lineFilter a def instead of a val, but is this expected behavior? It doesn't seem right that this problem should only become apparent after getting an NPE at runtime.

It is indeed the expected behavior, and is exactly the same problem as in this question:
Scala 2.8: how to initialize child class
You can basically copy-paste the answer form that question. Solutions include:
def or lazy val instead of val
early initialization of lineFilter
redesign of your classes to avoid the “virtual method call from superclass's constructor which accesses uninitialized subclass values” problem. For instance, why would you want to store the filter function in a val or return in from a def, while it could be implemented as a method?
abstract class GenericParser[T] {
def lineFilter(line: String): Boolean
parseData()
def parseData() : T {
for( line <- .... if lineFilter(line) )
// do things
}
}
class SalesParser extends GenericParser[SalesRow] {
def lineFilter(line: String) = !line.startsWith("//")
}

Related

How to qualify methods as static in Scala?

I have a class
class MyClass {
def apply(myRDD: RDD[String]) {
val rdd2 = myRDD.map(myString => {
// do String manipulation
}
}
}
object MyClass {
}
Since I have a block of code performing one task (the area that says "do String manipulation"), I thought I should break it out into its own method. Since the method is not changing the state of the class, I thought I should make it a static method.
How do I do that?
I thought that you can just pop a method inside the companion object and it would be available as a static class, like this:
object MyClass {
def doStringManipulation(myString: String) = {
// do String manipulation
}
}
but when I try val rdd2 = myRDD.map(myString => { doStringManipulation(myString)}), scala doesn't recognize the method and it forces me to do MyClass.doStringManipulation(myString) in order to call it.
What am I doing wrong?
In Scala there are no static methods: all methods are defined over an object, be it an instance of a class or a singleton, as the one you defined in your question.
As you correctly pointed out, by having a class and an object named in the same way in the same compilation unit you make the object a companion of the class, which means that the two have access to each others' private fields and methods, but this does not mean they are available without specifying which object you are accessing.
What you want to do is either using the long form as mentioned (MyClass.doStringManipulation(myString)) or, if you think it makes sense, you can just import the method in the class' scope, as follows:
import MyClass.doStringManipulation
class MyClass {
def apply(myRDD: RDD[String]): Unit = {
val rdd2 = myRDD.map(doStringManipulation)
}
}
object MyClass {
private def doStringManipulation(myString: String): String = {
???
}
}
As a side note, for the MyClass.apply method, you used the a notation which is going to disappear in the future:
// this is a shorthand for a method that returns `Unit` but is going to disappear
def method(parameter: Type) {
// does things
}
// this means the same, but it's going to stay
// the `=` is enough, even without the explicit return type
// unless, that is, you want to force the method to discard the last value and return `Unit`
def method(parameter: Type): Unit = {
// does things
}
You should follow scala's advice.
val rdd2 = myRDD.map(MyClass.doStringManipulation)
Write this inside the class then it will work as expected.
import MyClass._

Scala - Add member variable to class from outside

Is it possible to add a member variable to a class from outside the class? (Or mimic this behavior?)
Here's an example of what I'm trying to do. I already use an implicit conversion to add additional functions to RDD, so I added a variable to ExtendedRDDFunctions. I'm guessing this doesn't work because the variable is lost after the conversion in a rdd.setMember(string) call.
Is there any way to get this kind of functionality? Is this the wrong approach?
implicit def toExtendedRDDFunctions(rdd: RDD[Map[String, String]]): ExtendedRDDFunctions = {
new ExtendedRDDFunctions(rdd)
}
class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) extends Logging with Serializable {
var member: Option[String] = None
def getMember(): String = {
if (member.isDefined) {
return member.get
} else {
return ""
}
}
def setMember(field: String): Unit = {
member = Some(field)
}
def queryForResult(query: String): String = {
// Uses member here
}
}
EDIT:
I am using these functions as follows: I first call rdd.setMember("state"), then rdd.queryForResult(expression).
Because the implicit conversion is applied each time you invoke a method defined in ExtendedRDDFunctions, there is a new instance of ExtendedRDDFunctions created for every call to setMember and queryForResult. Those instances do not share any member variables.
You have basically two options:
Maintain a Map[RDD, String] in ExtendedRDDFunctions's companion object which you use to assign the member value to an RDD in setMember. This is the evil option as you introduce global state and open pitfalls for a whole range of errors.
Create a wrapper class that contains your member value and is returned by the setMember method:
case class RDDWithMember(rdd: RDD[Map[String, String]], member: String) extends RDD[Map[String, String]] {
def queryForResult(query: String): String = {
// Uses member here
}
// methods of the RDD interface, just delegate to rdd
}
implicit class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) {
def setMember(field: String): RDDWithMember = {
RDDWithMember(rdd, field)
}
}
Beside the omitted global state, this approach is also more type safe because you cannot call queryForResult on instances that do not have a member. The only downsides are that you have to delegate all members of RDD and that queryForResult is not defined on RDD itself.
The first issue can probably be addressed with some macro magic (search for "delegate" or "proxy" and "macro").
The later issue can be resolved by defining an additional extension method in ExtendedRDDFunctions that checks if the RDD is a RDDWithMember:
implicit class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) {
def setMember(field: String): RDDWithMember = // ...
def queryForResult(query: String): Option[String] = rdd match {
case wm: RDDWithMember => Some(wm.queryForResult(query))
case _ => None
}
}
import ExtendedRDDFunctions._
will import all attributes and functions from Companion object to be used in the body of your class.
For your usage look for delagate pattern.

Scala fails to initialize a val

I have found kind of a weirdness in the following Scala program (sorry to include all the code, but you'll see why I added it all) :
object md2html extends App {
private val DEFAULT_THEME = Themes.AMAZON_LIGHT
private val VALID_OPTIONS = Set("editorTheme", "logo", "style")
try {
// some code 1
} catch {
case t: Throwable => t.printStackTrace(); exitWithError(t.getMessage)
}
// some code 2 (method definitions only)
private def parseOption(key: String, value: String) = {
println(key + " " + VALID_OPTIONS)
if (! Set("theme","editorTheme", "logo", "style").contains(key)) exitWithError(s"$key is not a valid option")
if (key == "theme") Themes(value).toMap else Map(key.drop(2) -> value)
}
// some code 3 (method definitions only)
}
If VALID_OPTIONS is defined after one of the some code..., it is evaluated to null in parseOption. I can see no good reason for that. I truncated the code for clarity, but if some more code is required I'll be happy to add it.
EDIT : I looked a bit more into it, and here is what I found.
When extending App, the val is not initialized with this code
object Test extends App {
printTest()
def printTest = println(test)
val test = "test"
}
With a regular main method, it works fine :
object Test {
def main(args: Array[String]): Unit = {
printTest
}
def printTest = println(test)
val test = "test"
}
I had overseen that you use extends App. This is another pitfall in Scala, unfortunately:
object Foo extends App {
val bar = "bar"
}
Foo.bar // null!
Foo.main(Array())
Foo.bar // now initialized
The App trait defers the object's initialization to the invocation of the main method, so all the vals are null until the main method has been called.
In summary, the App trait and vals do not mix well. I have fallen into that trap many times. If you use App, avoid vals, if you have to use global state, use lazy vals instead.
Constructor bodies, and this goes for singleton objects as well, are evaluated strictly top to bottom. This is a common pitfall in Scala, unfortunately, as it becomes relevant where the vals are defined if they are referenced in other places of the constructor.
object Foo {
val rab = useBar // oops, involuntarily referring to uninitialized val
val bar = "bar"
def useBar: String = bar.reverse
}
Foo // NPE
Of course, in a better world, the Scala compiler would either disallow the above code, re-order the initialization, or at least warn you. But it doesn't...

How store methods vals without recreating them every method call

I have Scala class which methods use a lot of regex. Each class method use some regex patterns.
Looking from the perspective of code modularity I should store those patterns in method:
class Bar {
def foo() {
val patt1 = "[ab]+".r
val patt2 = "[cd]+".r
/*...*/
}
}
But this approach is quite inefficient. Patterns are recompiled on each method call.
I could move them directly to class:
class Bar {
val fooPatt1 = "[ab]+".r
val fooPatt2 = "[cd]+".r
/*...*/
}
but in case when I have 30 methods it looks ugly.
I ended up with some hybrid solution using val and anonymous function:
val z = {
val patt1 = "[ab]+".r
val patt2 = "[cd]+".r
() => { /* ... */ }
}
but I am not sure if using val to store function have some drawbacks compared to def. Maybe there is other clean solution to store methods constants without polluting the class?
Using a val is perfectly fine. There might be a (very) small performance hit, but in most (99.9%) of the applications that's not a problem.
You could also create a class for the method
// The extends is not needed, although you might want to hide the Foo type
class Foo extends (() => ...) {
val patt1 = "[ab]+".r
val patt2 = "[cd]+".r
def apply() = {
...
}
}
Then in the class:
class Bar {
val foo = new Foo
}
Another solution is using traits
trait Foo {
private lazy val patt1 = "[ab]+".r
private lazy val patt2 = "[cd]+".r
def foo() = ...
}
class Bar extends Foo with ...
Note that if you have different methods like that in a single class, it can be sign that the single responsibility principle is violated. Moving them to their own class (or trait) can be a solution for that problem as well.
I would put every method with the necessary regex in it's own Trait:
class Bar extends AMethod with BMethod
trait AMethod {
private val aPattern = """\d+""".r
def aMethod(s: String) = aPattern.findFirstIn(s)
}
trait BMethod {
private val bPattern = """\w+""".r
def bMethod(s: String) = bPattern.findFirstIn(s)
}
clean
separated
easy to test (object AMethodSpec extends Properties("AMethod") with AMethod ...)
I took into account Chris comment. Putting patterns to companion object is probably the most efficient approach but very unclean when we have more methods.
EECOLOR solution is less efficient but cleaner. Traits prevents recreating patterns on each method call. Unfortunately, scala do not use same compiled pattern accross multiple class instances:
(new X).patt1==(new X).patt1 // would be false.
I've combined those two approaches and instead traits I used objects.
object X {
object method1 {
val patt1 = "a".r
}
object method2 {
val patt1 = "a".r
}
}
class X {
def method1 = {
import X.method1._
patt1
}
def method2 = {
import X.method2._
patt1
}
}
(new X).method1 == (new X).method1 // true
(new X).method2 == (new X).method2 // true
Although this approach works, I think scala should provide some solution for that problem out of box. Patterns are the simplest example. We could have other immutable objects which initialization is much more expensive.
Extracting method internals somewhere outside is still unclear. It would be nice to do it like with lazy vals. Adding one modificator should ensure that value is instance only once across all instances and methods calls. It would be something like that:
def method1 {
static val x = new VeryExpensiveObject
}

Why can't I use this.getClass in auxiliary constructor in scala?

Why can't I use this.getClass in auxiliary constructor in scala? Are there any alternatives?
More specifically, I am trying to call LoggerFactory.getLogger of slf4j in the auxiliary constructor. I have an hack now where I am forced to pass a logger object to the constructor.
A simple contrived example (does not compile) which shows what I am trying to do:
class A (numbers : Double) {
val logger = LoggerFactory.getLogger(this.getClass)
def this(numbersAsStr: String) = this (try { s.toDouble) } catch { case _ => LoggerFactory.getLogger(this.getClass).error("Failed to convert"); 0 }
}
This is actually a limitation of the JVM rather than specifically a Scala problem. Here's a similar example in Java:
public class ThisTest {
public final String name;
public ThisTest(String n) {
name = n;
}
public ThisTest() {
// trying to use `this` in a call to the primary constructor
this(this.getClass().getName());
}
}
When you try to compile it you get an error:
$ javac ThisTest.java
ThisTest.java:10: error: cannot reference this before supertype constructor has been called
this(this.getClass().getName());
^
1 error
The problem is that you're trying to reference this before this any of the super-constructors for this have been called. You will have the restriction that you can't use a this reference in a super() or this() call no matter what JVM language you use, because that's the way classes work on the JVM.
However, you can totally avoid this problem by restructuring your code to put the reference to this after the this() call:
class A (numbers: Double) {
val logger = LoggerFactory.getLogger(this.getClass)
def this(numbersAsStr: String) = {
this ( try { numbersAsStr.toDouble } catch { case _ => 0 } )
LoggerFactory.getLogger(this.getClass).error("Failed to convert");
}
}
You might actually want access to the thrown exception for your log info. In that case, I'd just use LoggerFactory.getLogger(classOf[A]). That won't give you the actual class name if you're using inheritance (which I was assuming was the case here), but if you include the stack trace in the log then you should be able to figure it out.
Not sure I understand the question. Here is a guess:
class Foo(val c: Class[_]) {
def this() = this(classOf[Foo])
}
new Foo().c // -> class Foo