Scala lazy val interpretation - scala

I am learning Scala basics. I just came across the lazy val concept. I have following code snippets which work without errors/warnings
Case 1
lazy val a = 10 + b
lazy val b = 5
println(a)
Case 2
lazy val a = 10 + b
val b = 5
println(a)
Case 3
val a = 10 + b
lazy val b = 5
println(a)
I understand how case 1 & 2 work. But I don't understand how the code in case 3 is working without error/warning. How is Scala able to evaluate the value of a when b is not yet defined?
EDIT
I am not running this code in Scala REPL. I have saved the code in case 3 in file called lazyVal.scala. I am executing it using scala lazyVal.scala. I think scala interprets the code in the file.
If I change the code in lazyVal.scala to
val a = 10 + b
val b = 5
println(a)
And execute it using scala lazyVal.scala I do get warning
/Users/varun.risbud/scalaRepo/src/Chapter1/lazyVal.scala:1: warning: Reference to uninitialized value b
val a = 10 + b
^
one warning found
10
Also If I change the code to create object and extend App it works
object lazyVal extends App {
val a = 10 + b
lazy val b = 5
println(a)
}
➜ Chapter1 scalac lazyVal.scala
➜ Chapter1 scala lazyVal
15
My scala version is 2.12.1 if that makes any difference.

Statements in a constructor execute in textual order, which is why you get a warning when the initialization of a refers to the uninitialized b. It's a common error to compose a class in a way that you don't even get the warning. (There's a FAQ tutorial about that.)
The same text is forbidden in a local sequence of statements:
scala> :pa
// Entering paste mode (ctrl-D to finish)
locally {
val a = 10 + b
lazy val b = 5
println(a)
}
// Exiting paste mode, now interpreting.
<console>:13: error: forward reference extends over definition of value a
val a = 10 + b
^
As members of a class or object, the lazy member is evaluated "on demand", when a is evaluated during construction.
scala> :pa
// Entering paste mode (ctrl-D to finish)
object X {
val a = 10 + b
lazy val b = 5
println(a)
}
// Exiting paste mode, now interpreting.
defined object X
scala> X
15
res1: X.type = X$#6a9344f5
The script runner packages your lines of code this way:
object X {
def main(args: Array[String]): Unit =
new AnyRef {
val a = 10 + b
lazy val b = 5
println(a)
}
}
If you give it an object with a main or that extends App, it won't wrap the code but just use it directly.
There are subtle differences between the three formulations. For example, the constructor of a top-level object is run as a static initializer; but an App is special-cased to run initializer code as main. (They're getting rid of App because it's confusing.)

Related

Scala shell. Declare same varibale name multiple times

In Scala Shell, I can declare same variable multiple times and I am not getting any error/warning
For Example
scala> val a = 1
a : Int = 1
scala> val a = 2
a : Int = 2
scala> val a = 1
a : Int = 1
scala> lazy val a = 1
a : Int = <lazy>
Here variable name "a" is declared multiple times with var, val and lazy val
So I would like to know
How scala complier took this? eg: val a = 1 and var a = 2 which is higher precedence?
Why Scala shell is accepting while declaring the same name of variable multiple time?
How do i know whether declared variable is mutable or immutable since the same variable name is declared as var and val?
Note: In IntelliJ, Able to declare same variable with multiple time and I don't see error but while accessing IDE shows error as "Can not resolve varibale" so what is the use the declaring same variable multiple times?
In the repl, there is often experimenting and prototyping taking place, and redefining a val is most often not by mistake, but intentional.
Precedence is taken what you typed finally successful.
scala> val a: Int = 7
a: Int = 7
scala> val a: Int = "foo"
<console>:12: error: type mismatch;
found : String("foo")
required: Int
val a: Int = "foo"
^
scala> a
res7: Int = 7
If you aren't sure, whether a name is already in use, you may just type the name, like a in my case, and get a feedback. For undeclared values, you get:
scala> b
<console>:13: error: not found: value b
b
^
But if you paste a block of code with the :pas technique, multiple names in conflict won't work and the whole block is discarded.

Unable to run spark map function that reads a Tuple RDD and returns a Tuple RDD

I have a requirement to generate a paired RDD from another paired RDD. Basically, I am trying to write a map function that does the following.
RDD[Polygon,HashSet[Point]] => RDD[Polygon,Integer]
Here is the code I have written:
Scala Function that iterates over HashSet and adds up a value from the "Point" Object.
def outCountPerCell( jr: Tuple2[Polygon,HashSet[Point]] ) : Tuple2[Polygon,Integer] = {
val setIter = jr._2.iterator()
var outageCnt: Int = 0
while(setIter.hasNext()) {
outageCnt += setIter.next().getCoordinate().getOrdinate(2).toInt
}
return Tuple2(jr._1,Integer.valueOf(outageCnt))
}
Applying the function on a paired RDD, which is throwing an error:
scala> val mappedJoinResult = joinResult.map((t: Tuple2[Polygon,HashSet[Point]]) => outCountPerCell(t))
<console>:82: error: type mismatch;
found   : ((com.vividsolutions.jts.geom.Polygon, java.util.HashSet[com.vividsolutions.jts.geom.Point])) => (com.vividsolutions.jts.geom.Polygon, Integer)
required: org.apache.spark.api.java.function.Function[(com.vividsolutions.jts.geom.Polygon, java.util.HashSet[com.vividsolutions.jts.geom.Point]),?]
       val mappedJoinResult = joinResult.map((t: Tuple2[Polygon,HashSet[Point]]) => outCountPerCell(t))
Can someone take a look and see what I am missing, or share any example code that uses custom function inside map() operation.
The problem here is that the joinResult is a JavaPairRDD from the Java API. This data structure's map is expecting Java type lambdas (Function) which are not (at least trivially) interchangeable with Scala lambdas.
So there are two solutions: try to convert the given method into a Java Function to be passed to map or simply use the Scala RDD as the developers intended:
Setup Dummy Data
Here I create some standin classes and make a Java RDD with a similar structure to OP's:
scala> case class Polygon(name: String)
defined class Polygon
scala> case class Point(ordinate: Int)
defined class Point
scala> :pa
// Entering paste mode (ctrl-D to finish)
/* More idiomatic method */
def outCountPerCell( jr: (Polygon,java.util.HashSet[Point])) : (Polygon, Integer) =
{
val count = jr._2.asScala.map(_.ordinate).sum
(jr._1, count)
}
// Exiting paste mode, now interpreting.
outCountPerCell: (jr: (Polygon, java.util.HashSet[Point]))(Polygon, Integer)
scala> val hs = new java.util.HashSet[Point]()
hs: java.util.HashSet[Point] = []
scala> hs.add(Point(2))
res13: Boolean = true
scala> hs.add(Point(3))
res14: Boolean = true
scala> val javaRDD = new JavaPairRDD(sc.parallelize(Seq((Polygon("a"), hs))))
javaRDD: org.apache.spark.api.java.JavaPairRDD[Polygon,java.util.HashSet[Point]] = org.apache.spark.api.java.JavaPairRDD#14fc37a
Use Scala RDD
The underlying Scala RDD can be retrieved from the Java RDD by using .rdd:
scala> javaRDD.rdd.map(outCountPerCell).foreach(println)
(Polygon(a),5)
Even better, use mapValues with Scala RDD
Since only the second part of the tuples are changing this problem can be cleanly solved with .mapValues:
scala> javaRDD.rdd.mapValues(_.asScala.map(_.ordinate).sum).foreach(println)
(Polygon(a),5)

Getting a null with a val depending on abstract def in a trait [duplicate]

This question already has answers here:
Scala - initialization order of vals
(3 answers)
Closed 7 years ago.
I'm seeing some initialization weirdness when mixing val's and def's in my trait. The situation can be summarized with the following example.
I have a trait which provides an abstract field, let's call it fruit, which should be implemented in child classes. It also uses that field in a val:
scala> class FruitTreeDescriptor(fruit: String) {
| def describe = s"This tree has loads of ${fruit}s"
| }
defined class FruitTreeDescriptor
scala> trait FruitTree {
| def fruit: String
| val descriptor = new FruitTreeDescriptor(fruit)
| }
defined trait FruitTree
When overriding fruit with a def, things work as expected:
scala> object AppleTree extends FruitTree {
| def fruit = "apple"
| }
defined object AppleTree
scala> AppleTree.descriptor.describe
res1: String = This tree has loads of apples
However, if I override fruit using a val...
scala> object BananaTree extends FruitTree {
| val fruit = "banana"
| }
defined object BananaTree
scala> BananaTree.descriptor.describe
res2: String = This tree has loads of nulls
What's going on here?
In simple terms, at the point you're calling:
val descriptor = new FruitTreeDescriptor(fruit)
the constructor for BananaTree has not been given the chance to run yet. This means the value of fruit is still null, even though it's a val.
This is a subcase of the well-known quirk of the non-declarative initialization of vals, which can be illustrated with a simpler example:
class A {
val x = a
val a = "String"
}
scala> new A().x
res1: String = null
(Although thankfully, in this particular case, the compiler will detect something being afoot and will present a warning.)
To avoid the problem, declare fruit as a lazy val, which will force evaluation.
The problem is the initialization order. val fruit = ... is being initialized after val descriptor = ..., so at the point when descriptor is being initialized, fruit is still null. You can fix this by making fruit a lazy val, because then it will be initialized on first access.
Your descriptor field initializes earlier than fruit field as trait intializes earlier than class, that extends it. null is a field's value before initialization - that's why you get it. In def case it's just a method call instead of accessing some field, so everything is fine (as method's code may be called several times - no initialization here). See, http://docs.scala-lang.org/tutorials/FAQ/initialization-order.html
Why def is so different? That's because def may be called several times, but val - only once (so its first and only one call is actually initialization of the fileld).
Typical solution to such problem - using lazy val instead, it will intialize when you really need it. One more solution is early intializers.
Another, simpler example of what's going on:
scala> class A {val a = b; val b = 5}
<console>:7: warning: Reference to uninitialized value b
class A {val a = b; val b = 5}
^
defined class A
scala> (new A).a
res2: Int = 0 //null
Talking more generally, theoretically scala could analize the dependency graph between fields (which field needs other field) and start initialization from final nodes. But in practice every module is compiled separately and compiler might not even know those dependencies (it might be even Java, which calls Scala, which calls Java), so it's just do sequential initialization.
So, because of that, it couldn't even detect simple loops:
scala> class A {val a: Int = b; val b: Int = a}
<console>:7: warning: Reference to uninitialized value b
class A {val a: Int = b; val b: Int = a}
^
defined class A
scala> (new A).a
res4: Int = 0
scala> class A {lazy val a: Int = b; lazy val b: Int = a}
defined class A
scala> (new A).a
java.lang.StackOverflowError
Actually, such loop (inside one module) can be theoretically detected in separate build, but it won't help much as it's pretty obvious.

Defining a val in terms of the member of an abstract val (Scala bug?)

I came across a runtime error and was wondering whether this is a bug in Scala, or if it shouldn't at least be caught during compile time.
This code produces a NullPointerException:
object Main extends App {
trait A {
val data: { val x: Int }
val x = data.x
}
val a = new A {
val data = new Object { val x = 42 }
}
a.x
}
Of course it's easy to fix by making A.x lazy or a def, but as obvious as that may be in this minimal example, in more realistic code it can be a little perplexing.
This is confusing when you encounter it for the first time, but it is expected behaviour.
The normal initialization order is that vals in super traits are initialized first. In your example this means that val x in trait A gets initialized before val data in your anonymous subclass, therefore causing a NullPointer.
If you want to make your example work you have to use a feature called "Early Defintions" (5.1.6 in the language specification).
In your concrete example this is the syntax you'd need to use:
val a = new {
val data = new Object { val x = 42 }
} with A
This initializes the data val before initializing the vals in A.
I had forgotten that this option is mentioned in the one-question FAQ.
$ scala -Xcheckinit
Welcome to Scala version 2.11.0-RC1 (OpenJDK 64-Bit Server VM, Java 1.7.0_25).
Type in expressions to have them evaluated.
Type :help for more information.
scala> object Main extends App {
| trait A {
| val data: { val x: Int }
| val x = data.x
| }
| val a = new A {
| val data = new Object { val x = 42 }
| }
| a.x
| }
<console>:15: warning: a pure expression does nothing in statement position; you may be omitting necessary parentheses
a.x
^
warning: there were 1 feature warning(s); re-run with -feature for details
defined object Main
scala> Main main null
scala.UninitializedFieldError: Uninitialized field: <console>: 13
at Main$$anon$1.data(<console>:13)
at Main$A$class.$init$(<console>:10)
... 43 elided

Make a Scala interpreter oblivious between interpret calls

Is it possible to configure a Scala interpreter (tools.nsc.IMain) so that it "forgets" the previously executed code, whenever I run the next interpret() call?
Normally when it compiles the sources, it wraps them in nested objects, so all the previously defined variables, functions and bindings are available.
It would suffice to not generate the nested objects (or to throw them away), although I would prefer a solution which would even remove the previously compiled classes from the class loader again.
Is there a setting, or a method, or something I can overwrite, or an alternative to IMain that would accomplish this? I need to be able to still access the resulting objects / classes from the host VM.
Basically I want to isolate subsequent interpret() calls without something as heavy weight as creating a new IMain for each iteration.
Here is one possible answer. Basically there is method reset() which calls the following things (mostly private, so either you buy the whole package or not):
clearExecutionWrapper()
resetClassLoader()
resetAllCreators()
prevRequests.clear()
referencedNameMap.clear()
definedNameMap.clear()
virtualDirectory.clear()
In my case, I am using a custom execution wrapper, so that needs to be set up again, and also imports are handled through a regular interpret cycle, so either add them again, or—better—just prepend them with the execution wrapper.
I would like to keep my bindings, they are also gone:
import tools.nsc._
import interpreter.IMain
object Test {
private final class Intp(cset: nsc.Settings)
extends IMain(cset, new NewLinePrintWriter(new ConsoleWriter, autoFlush = true)) {
override protected def parentClassLoader = Test.getClass.getClassLoader
}
object Foo {
def bar() { println("BAR" )}
}
def run() {
val cset = new nsc.Settings()
cset.classpath.value += java.io.File.pathSeparator + sys.props("java.class.path")
val i = new Intp(cset)
i.initializeSynchronous()
i.bind[Foo.type]("foo", Foo)
val res0 = i.interpret("foo.bar(); val x = 33")
println(s"res0: $res0")
i.reset()
val res1 = i.interpret("println(x)")
println(s"res1: $res1")
i.reset()
val res2 = i.interpret("foo.bar()")
println(s"res2: $res2")
}
}
This will find Foo in the first iteration, correctly forget x in the second iteration, but then in the third iteration, it can be seen that the foo binding is also lost:
foo: Test.Foo.type = Test$Foo$#8bf223
BAR
x: Int = 33
res0: Success
<console>:8: error: not found: value x
println(x)
^
res1: Error
<console>:8: error: not found: value foo
foo.bar()
^
res2: Error
The following seems to be fine:
for(j <- 0 until 3) {
val user = "foo.bar()"
val synth = """import Test.{Foo => foo}
""".stripMargin + user
val res = i.interpret(synth)
println(s"res$j: $res")
i.reset()
}